How I Built an AI System to Reduce Healthcare No-Shows Using Flask, Random Forest & SimPy.
A walkthrough of my final year project — from problem statement to working simulation
The Problem I Wanted to Solve
Anyone who has visited a clinic knows the frustration — long wait times, overbooked doctors, and yet somehow, empty slots because patients didn't show up.
No-shows are one of the biggest inefficiencies in healthcare. Clinics lose revenue. Doctors waste time. Other patients who actually needed that slot couldn't get one.
I wanted to build something that tackles this with a data-driven approach. The result: an AI-Based Healthcare Appointment Scheduling Optimization System — my final year project built with Python, Flask, scikit-learn, and SimPy.
Here's how I built it, what I learned, and what I'd do differently.
What the System Does
At its core, the system does three things:
- Predicts which patients are likely to miss their appointment (no-show prediction)
- Uses that prediction to assign slots smartly (priority-based scheduling)
- Simulates a full clinic day to prove the approach actually works (SimPy simulation)
There are two portals:
- A Patient Portal where patients register, book appointments, and see their no-show risk
- An Admin Dashboard where clinic staff manage doctors, generate slots, and run simulations
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Python 3.11, Flask 3.0 |
| Database | SQLite + SQLAlchemy ORM |
| Machine Learning | scikit-learn (Random Forest) |
| Simulation | SimPy (Discrete-Event) |
| Frontend | Bootstrap 5, Chart.js |
| Data | pandas, numpy |
Part 1: The No-Show Predictor
This is the heart of the project.
I trained a Random Forest Classifier to predict the probability that a patient will miss their appointment. The model outputs a score between 0 and 1, which I then bucket into three risk levels:
- LOW — probability < 40%
- MEDIUM — probability between 40–70%
- HIGH — probability ≥ 70%
Features used:
- previous_no_shows (how many times they've missed before)
- days_until_appointment (further away = higher risk)
- appointment_hour (early morning slots have higher no-show rates)
- day_of_week (Mondays and Fridays are worse)
- age
- gender
- reminder_sent (did they get a reminder?)
- distance_km (how far they live from the clinic)
Model config:
RandomForestClassifier(
n_estimators=100,
max_depth=8,
class_weight='balanced' # important — no-shows are a minority class
)
I used class_weight='balanced' because no-shows are naturally less common than shows. Without this, the model would just learn to predict "will show up" for everyone and get high accuracy while being useless.
Training data:
I generated 1,200 synthetic patient records using a custom generate_data.py script. Obviously, real hospital data would be better — but for a final year project, synthetic data with realistic distributions works well enough to demonstrate the concept.
Part 2: Priority-Based Slot Allocation
Once I have the no-show probability, I use it to compute a priority score for each booking request:
Score = 0.5 × (urgency / 5) + 0.3 × (wait_days / 30) + 0.2 × (1 - no_show_prob)
Breaking this down:
- Urgency (50% weight) — a patient with a critical condition gets priority
- Wait time (30% weight) — patients waiting longer get bumped up
- Reliability (20% weight) — lower no-show probability = more trustworthy booking
The system then assigns the highest-priority patient to the best available slot.
Overbooking Strategy
This is where it gets interesting. Based on the risk tier:
- HIGH risk (≥70%): The slot stays open after booking — another patient can fill it if needed
- MEDIUM risk (40–70%): Booked normally, but a reminder flag is set
- LOW risk (<40%): Normal booking, slot is closed
This is a simplified version of how airlines overbook flights — except here, we're trying to ensure sick people actually get seen, not maximize revenue.
Part 3: SimPy Simulation
The ML model tells us who is likely to no-show. But does the overall strategy actually improve clinic efficiency? That's where SimPy comes in.
SimPy is a Python library for discrete-event simulation. I used it to simulate an entire 8-hour clinic day.
What the simulation models:
- Patients arriving at scheduled times
- Doctors processing appointments (with variable duration)
- No-shows happening at a defined rate
- Queue buildup and wait times
Comparing baseline vs. optimized:
| Metric | Baseline | AI-Optimized |
|---|---|---|
| No-show rate | 25% | ~10% effective |
| Avg wait time | Higher | Lower |
| Doctor utilization | Lower | Higher |
| Patients seen | Fewer | More |
The simulation confirms that the overbooking + priority strategy meaningfully improves throughput and reduces wasted slots.
Project Structure
healthcare_scheduler/
├── app.py # Main Flask application
├── config.py
├── seed_db.py # Populates DB with sample data
├── RUN_PROJECT.bat # One-click Windows launcher
│
├── ai_modules/
│ ├── no_show_predictor.py # Random Forest model
│ ├── scheduler.py # Priority slot allocator
│ └── simulation.py # SimPy simulation
│
├── models/ # SQLAlchemy DB models
├── routes/ # Flask API endpoints
├── templates/ # HTML templates
└── data/
└── generate_data.py # Synthetic dataset generator
## How to Run It Locally (Windows)
bash
Option 1: Just double-click RUN_PROJECT.bat
It handles everything automatically
Option 2: Manual
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python data\generate_data.py
python ai_modules\no_show_predictor.py
python seed_db.py
python app.py
Then open `http://127.0.0.1:5000` in your browser.
Demo credentials:
- Patient: `ravi@mail.com` / `pass123`
- Admin: `http://127.0.0.1:5000/admin/`
---
## What I Learned
1. The ML pipeline is the easy part.
Training the model took a few hours. Getting Flask, SQLAlchemy, and the ML model to work together cleanly took much longer. Integration is where real projects live.
2. Synthetic data has real limits.
My model performs well on my synthetic test set. Whether it would hold up on real patient data is a completely different question. Real-world class imbalance, missing values, and biases would make this much harder.
3. SimPy is underrated.
Most developers have never heard of discrete-event simulation. But for modeling anything with queues, arrivals, and service times — clinics, call centers, manufacturing lines — SimPy is incredibly powerful and worth learning.
4.`class_weight='balanced'` matters.
Before I added this, my model had 85% accuracy but was nearly useless — it just predicted "will show up" every time. Balanced class weights fixed this. Always check your class distribution before celebrating accuracy scores.
---
## What I'd Improve With More Time
- **Real dataset** — The [KaggleHealthcare No-Show dataset](https://www.kaggle.com/joniarroba/noshowappointments) has 110,000 real records. Training on that would make the model actually meaningful.
- **Cross-validation & hyperparameter tuning** — I used defaults mostly. GridSearchCV would squeeze more performance out of the model.
- **Better features** — Weather on appointment day, insurance type, appointment type (follow-up vs. new patient) are all predictive in research literature.
- **Deploy it** — Currently Windows-only. Dockerizing it and deploying to Render or Railway would make it actually accessible.
- **Send real reminders** — Right now the "reminder_sent" flag is manual. Integrating Twilio or email would make the overbooking strategy actually work end-to-end.
---
## GitHub
The full source code is here: **https://github.com/ManishKumar981/-healthcare-scheduler**
If you found this useful, a ⭐ on the repo goes a long way!
---
*Thanks for reading. If you have questions about the ML approach, the SimPy simulation, or the Flask architecture — drop them in the comments. Happy to discuss.*
Top comments (0)