Understanding KNN with Iris Dataset – A Beginner's Visual Guide
Introduction
As part of my ml-chronicles series, I decided to start things off with a foundational yet powerful algorithm: K-Nearest Neighbors (KNN).
Before diving in, I skimmed through some great GFG articles to grasp the theory. Later I followed a YouTube tutorial to understand how it’s applied practically. I wanted this project to be more interactive. So, I added several features to visualize the model's behavior.And also explored its evaluation metrics interactively.
This post covers follwing stuff:
- Building a KNN model from scratch,
- Applying it to the Iris dataset,
- Visualizing the results,
- Testing with custom inputs, and
- Optimizing the best value of k.
Credits and resources:
GFG article: https://www.geeksforgeeks.org/k-nearest-neighbours/
Youtube tutorial: https://youtu.be/mpfU9n4MzBE?feature=shared
What is KNN ?
KNN stands for K-Nearest Neighbors. It is also called as lazy learning algorithm. It doesn’t train a model explicitly. It doesn't learn from the training set immediately but stores the dataset and at the time of classification performs action on the dataset. It looks at the k nearest data points around a query and returns the most common class among them.
If dataset has significant outliers or noise a higher k helps smooth out predictions and noise.. However very high k can often cause underfitting.
There are various statistical methods for selecting best value of k : cross-validation, elbow method, odd values for k etc.
Also the distances can be measured in some of the famous methods: Euclidean distance, Manhattan distance, Minkowski distance etc.
Steps involved:
1.Measure distance (usually Euclidean) to all training points.
2.Pick the k closest ones.
3.Vote on the class label.
KNN works best with normalized data and small to medium datasets.
The Dataset: IRIS
The classic Iris dataset has:
- 150 samples across 3 classes: Setosa, Versicolor, and Virginica.
- 4 numerical features: Sepal Length, Sepal Width, Petal Length, and Petal Width.
- For my project: I removed the Id column and normalized all features using MinMaxScaler. Further I Mapped class labels to numeric codes (1, 2, 3) for internal processing.
My Model Architecture
Preprocessing
- Normalized all feature columns to [0, 1].
- Mapped species to numeric values for classification.
- Used 3D scatter plots to visualize separability among classes.
Custom KNN from Scratch
I implemented KNN in a separate module (kNN_modules.py).
It includes:
- A distance function to measure closeness: formula for euclidean distance is used.
- A KNN() function that predicts the class using statistics.mode() from nearest neighbors.
- Optional print to view selected neighbors. This helped me build intuition about how predictions are formed.
Visualizations: Following visuals were implemented .. I am attaching the results
Confusion Matrix
Used sklearn’s ConfusionMatrixDisplay to evaluate predictions visually for a fixed k=5.
Accuracy Heatmap
I ran 10 random train-test splits (30% test) and varied k from 1 to 15.
Then visualized accuracy trends across combinations using a heatmap.
Best k – Average Accuracy Plot
I Computed average accuracy for each k value across runs.
Thus found the best_k using .idxmax().
Export
Saved all accuracy results to CSV for later inspection.
Interactive Query Input
One of my favorite parts was enabling this command-line prediction:
User can input 4 features like: SepalL SepalW PetalL PetalW
Input is scaled and classified using the custom KNN function
It prints the predicted species!
(It might skip in non-interactive environments — I handled that too.)
Results & Observations
The model performed best at k = 9, with average accuracy peaking there.
Visualization helped reveal class boundaries and overlaps
Writing the KNN logic from scratch gave me a hands-on view of its behavior.
Features I Added
- Normalization using MinMaxScaler
- Confusion Matrix using sklearn
- Heatmap of accuracy vs. k
- Interactive command-line query support
- CSV export for tracking experimental results
My code link: https://github.com/Amruta-25/ML_chronicles_KNN.git
This was Day 1 of my ml-chronicles.
If you're just starting out in ML — try building this from scratch.
What would you add or change in this project?
Top comments (1)
Me encantó el tutorial, muy claro, yo recien empiezo a investigar, en un mes tengo que usar técnicas de ML