DEV Community

Cover image for I trained a 354M LLM alone and it outperforms GPT-2 Medium in epistemic calibration
felipe muniz
felipe muniz

Posted on

I trained a 354M LLM alone and it outperforms GPT-2 Medium in epistemic calibration

Baseline Comparison

No team. No institutional funding. No university affiliation.

Just me, a RunPod account with 5x H200, and an architecture I have been building for the past year called ATIC, Adaptive Turing Intelligence Cognition.

ATIC is a geometric cognitive architecture based on Riemannian and toroidal manifolds. Instead of just predicting the next token, every forward pass produces aleatoric and epistemic uncertainty estimates, a 5D manifold position, and calibrated confidence scores. The model knows where it is in cognitive space, and it knows when it does not know.

AletheionLLM-v2 is the first LLM trained end-to-end with this architecture. 354M parameters, 1 billion tokens, 14 active loss functions, fp32.

The evaluation was done on WikiText-103, a dataset the model never saw during training.

ECE of 0.0176, against 0.0236 for GPT-2 Medium and 0.0241 for OPT-350M. Brier Score of 0.1528, best across all models compared. That is a 38% reduction in calibration error on out-of-distribution data.

The model does not just answer. It knows how much to trust its own answer.

Repo: github.com/gnai-creator/aletheion-llm-v2

Paper DOI: 10.13140/RG.2.2.11471.14241

Top comments (0)