I Pitted AutoML Against a Single Model. The Results Were Uncomfortable

#machinelearning #datascience #algorithms #ai

What Is AutoML?

AutoML systems, such as H2O AutoML, are designed to automate the process of building machine learning models.

You provide:

A dataset
A target column
Some patience

The system then:

Trains multiple models (GBMs, XGBoost, Random Forests, Deep Neural Networks, and ensembles)
Performs internal hyperparameter tuning
Applies cross-validation
Generates a leaderboard of results

AutoML focuses on achieving high predictive accuracy, often at the cost of:

Model complexity
Interpretability
Fine-grained control
Computational resources

This trade-off is intentional. AutoML is built to prioritize performance over simplicity.

What Is SmartKNN?

SmartKNN takes a different approach. Unlike AutoML, it does not train dozens of models.

Instead, SmartKNN emphasizes:

A single, optimized predictive model
Smarter neighborhood selection
Weighted distances for predictions
Controlled hyperparameter search
Consistent and stable inference

SmartKNN trades brute-force model diversity for simplicity, structure, and inductive bias, achieving competitive results with fewer models and more interpretability.

Experimental Setup

Classification

Cross-validation: 3-fold CV
H2O AutoML: nfolds = 3 and 20 models per dataset
SmartKNN: 3-fold CV
Metrics: Accuracy, F1-score

Regression

H2O AutoML: ~20 models per dataset
SmartKNN: Grid search (k, weight threshold tuned)
Metrics: MSE, R²

Note: No speed benchmarks reported

Classification Benchmarks

Nomao - [35k * 119]

Model	Accuracy	F1
H2O AutoML	0.9728	0.9667
SmartKNN	0.9569	0.9471

APS Failure - [76k * 171]

Model	Accuracy	F1
H2O AutoML	0.9696	0.9347
SmartKNN	0.9431	0.8661

Adult - [48k * 15]

Model	Accuracy	F1
H2O AutoML	0.8404	0.7975
SmartKNN	0.8240	0.7567

Click Prediction Small - [40k * 10]

Model	Accuracy	F1
H2O AutoML	0.7142	0.5967
SmartKNN	0.8036	0.5323

Bank Marketing - [45k * 17]

Model	Accuracy	F1
H2O AutoML	0.8940	0.7819
SmartKNN	0.8969	0.7074

Regression Benchmarks

Buzzword Twitter - [583k * 78]

Model	MSE ↓	R² ↑
H2O AutoML	23977.62	0.9361
SmartKNN	27939.95	0.9255

Diamonds - [54 * 10]

Model	MSE ↓	R² ↑
H2O AutoML (tuned)	1789603.30	0.8874
SmartKNN (tuned)	1987190.50	0.8750

California Housing - [20k * 10]

Model	MSE ↓	R² ↑
H2O AutoML (tuned)	2190147793	0.8398
SmartKNN (tuned)	3178661376	0.7676

Fried - [40k * 11]

Model	MSE ↓	R² ↑
H2O AutoML (tuned)	1.1555	0.9530
SmartKNN (tuned)	1.5117	0.9385

Notes on the Benchmarks

We evaluated 9 datasets across both classification and regression. The full benchmarking process took around 7 hours using H2O AutoML, while SmartKNN completed the same tasks significantly faster.

Interpreting the Results

These benchmarks were designed to test how well a semi-tuned SmartKNN can compete with modern tabular AutoML tools(H2O):

SmartKNN was tuned only on k and weight thresholds. Parameters like alpha, beta, and gamma were left at default, meaning there is potential for even higher accuracy with full tuning.
These results demonstrate that SmartKNN can hold its ground against state-of-the-art models and tools, delivering competitive performance with far less computational overhead.
In Some classification datasets, SmartKNN even outperformed H2O AutoML in accuracy, though H2O often retained higher F1 scores in those cases.
SmartKNN trades some accuracy for speed, but this benchmark demonstrates its strong baseline performance relative to a full model factory.

Additional Notes

The datasets used were mostly standard public datasets, not cherry-picked for favorable results.
Benchmarks are fully reproducible and available on Kaggle and other public repositories.
The goal was not to declare a winner, but to show that a carefully designed single model can Compete aginst AutoML systems in tabular data scenarios.