What if you could upload a photo of a skin lesion and get an AI-powered prediction in under 2 seconds — no signup, no data stored, completely free?
That's exactly what I built for my capstone thesis. SKIN is a web app that runs two CNN models: a 7-class skin lesion classifier trained on the dataset, and a binary monkeypox detector. It also uses Groq's Llama 4 Scout to generate plain-language medical explanations for each prediction.
Here's how I built it.
The Problem
Skin diseases are one of the most common reasons people visit a doctor, but access to dermatologists is limited in many parts of the world. Early detection of conditions like melanoma can be life-saving, yet most people don't know what to look for.
I wanted to build something that could give people a starting point — not a diagnosis, but an informed nudge to see a doctor.
Important disclaimer: This is an educational tool. It's not a medical device and should never replace a real dermatologist.
The Stack
| Layer | Tech |
|---|---|
| Backend | Flask (Python) |
| ML Inference | TensorFlow Lite |
| AI Explanations | Groq API (Llama 4 Scout 17B) |
| Frontend | Tailwind CSS, Alpine.js |
| Charts | Chart.js |
| Deployment | Render (free tier) |
I deliberately kept the stack simple. No React, no complex build pipeline. Jinja2 templates with Alpine.js for interactivity and Tailwind for styling. The entire app is a single app.py file.
The Models
Skin Lesion Classifier (HAM10000)
The HAM10000 dataset contains 10,015 dermatoscopic images of 7 types of skin lesions:
- akiec — Actinic Keratosis
- bcc — Basal Cell Carcinoma
- bkl — Benign Keratosis
- df — Dermatofibroma
- mel — Melanoma
- nv — Melanocytic Nevus (Mole)
- vasc — Vascular Lesion
I trained a MobileNetV2-based CNN and converted it to TFLite for fast inference. The final model is just 2.7 MB — small enough to load instantly on a free-tier server.
Overall accuracy: 71.64%. Not clinical-grade, but solid for a thesis project. The biggest challenge was class imbalance — melanocytic nevi dominated the dataset (~67% of all images), which made the model biased toward predicting moles.
Monkeypox Detector
A separate binary classifier that distinguishes monkeypox lesions from other skin conditions. This one hits 95% accuracy — binary problems are inherently easier, and the visual features of monkeypox are quite distinct.
The Architecture
Here's the interesting part. The app loads TFLite models at startup and keeps them in memory:
def get_model(model_type: str):
global loaded_models
if model_type not in loaded_models:
interpreter = tf.lite.Interpreter(model_path=model_path, num_threads=4)
interpreter.resize_tensor_input(
interpreter.get_input_details()[0]['index'],
[1, input_size, input_size, 3]
)
interpreter.allocate_tensors()
loaded_models[model_type] = {
'interpreter': interpreter,
'input_details': interpreter.get_input_details(),
'output_details': interpreter.get_output_details(),
'type': 'tflite'
}
return loaded_models[model_type]
First prediction is slow (cold start), but subsequent predictions are near-instant because the interpreter is already allocated.
Privacy by Design
One thing I'm proud of: no image ever touches disk. The uploaded file is read into memory, processed by the model, and discarded:
image_bytes = file.read()
pil_image = Image.open(io.BytesIO(image_bytes))
product = predict_image(pil_image, model_type, image_bytes=image_bytes)
# image_bytes goes out of scope and gets garbage collected
No database. No file system writes. No logging of uploads. If the server crashes, there's zero user data to leak.
Adding AI Explanations with Groq
Raw CNN output like "bcc — 73.2% confidence" isn't useful to most people. So I added Groq's free API to generate plain-language explanations:
def get_medgemma_explanation(image_bytes, cnn_label, confidence):
prompt = (
f"A CNN skin disease classifier detected: {cnn_label} "
f"({confidence:.1f}% confidence).\n"
"In plain text only, write 3 short paragraphs:\n"
"1. What this condition is.\n"
"2. What visual signs to look for.\n"
"3. Recommended next steps for the patient."
)
response = groq_client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
{"type": "text", "text": prompt},
],
}],
max_tokens=200,
)
return response.choices[0].message.content.strip()
Groq's free tier is fast enough for a demo app — responses come back in ~1 second. The Llama 4 Scout model is multimodal, so it can actually look at the image and correlate the CNN prediction with visual features.
If the API key isn't set, the feature gracefully degrades — no explanation shown, no error.
Security Considerations
Even for a thesis project, I didn't want to cut corners on security:
-
CSRF protection — manual token-based validation using Python's
secretsmodule (no extra dependencies) -
SRI hashes — all CDN scripts have
integrityattributes with SHA-384 hashes - Security headers — HSTS, X-Content-Type-Options, X-Frame-Options
- Rate limiting — 3 analyses per session to prevent abuse
- Input validation — file type whitelist, 10 MB size limit, 4096x4096 pixel cap
@app.after_request
def set_security_headers(response):
response.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
response.headers['X-Content-Type-Options'] = 'nosniff'
response.headers['X-Frame-Options'] = 'SAMEORIGIN'
return response
Deploying on Render (Free Tier)
The app runs on Render's free plan with a single Gunicorn worker. TFLite keeps memory usage low enough to stay within limits:
# render.yaml
services:
- type: web
name: skin-disease-detection
runtime: python
plan: free
startCommand: cd frontend && gunicorn -c ../gunicorn_config.py app:app
healthCheckPath: /health
Cold starts take ~30 seconds (TensorFlow import + model loading), but once warm, predictions are fast.
What I Learned
Class imbalance is the real boss fight. My model is great at detecting moles (85% accuracy) but struggles with rare conditions like dermatofibroma (31%). Oversampling and class weights help, but don't solve it completely.
TFLite is underrated for web apps. Going from a 58 MB H5 model to a 4.8 MB TFLite model with minimal accuracy loss was a game-changer for deployment.
LLM explanations add massive UX value. The Groq integration took ~30 lines of code but transformed the app from "here's a label and a number" to something actually useful.
You don't need React for everything. Alpine.js + Tailwind + Jinja2 gave me a modern, responsive UI with zero build step. The entire frontend is server-rendered HTML with sprinkles of interactivity.
Try It
Live Demo — upload any skin image and get an instant prediction.
GitHub — star the repo if you found this useful!
The app is open source under MIT. Fork it, improve the models, add new conditions, or swap in your own CNN. PRs welcome.
This project was built as a capstone thesis under the College of Computer and Information Sciences (CCIS). If you're working on something similar or have questions about the training pipeline, drop a comment below!
Top comments (0)