DEV Community

Andreas Bergström
Andreas Bergström

Posted on • Originally published at andreasbergstrom.dev

The LLM-shaped hole in your XGBoost pipeline

Every team shipping a tabular ML model gets the same question from leadership eventually: "Have we tried GPT for this?" The honest answer is that gradient-boosted trees still beat everything else on tabular prediction, and that hasn't changed because LLMs got bigger. TabPFN, FT-Transformer, and friends are interesting, but the short list of serious alternatives stays short.

There is an LLM-shaped hole in most XGBoost pipelines — it's just not where most people put it. The pattern that works is LLMs upstream of trees: embeddings as features, text-to-features extraction over unstructured fields, and tool-using agents to fill enrichment gaps. None of these touch the prediction path; they all feed signal into the model that does.

The full post walks through cost math, what each pattern is worth in practice, and the cases where the empirical answer turns out to be "no, the embeddings don't help."


Originally published at andreasbergstrom.dev — read the full post there.

Top comments (0)