I recently posted an article here on dev.to documenting my experiences testing small and large language models. I'm an engineering technician, not a software engineer, developer, programmer or coder. Since dev.to is a platform for those in the software field, it got me thinking about whether AI is really software or not and whether I should actually be posting my AI articles here.
AI is so intertwined with software these days that it seems like an odd question. When we speak casually, we answer “yes”: large‑language models (LLMs) and small‑language models (SLMs) are built by engineers, shipped through software pipelines, and are run by programs. Yet at the level of a systems programmer, compiler writer, or hardware designer, the question is far from trivial. What exactly is an AI model, what does it contain, and does it satisfy the technical definition of software?
For the purpose of hardware and systems design, software is executable logic—a sequence of instructions (machine code, bytecode, or interpreted source) that a processor can execute. It embodies control flow: branches, loops, calls, and returns. Anything that resolves to an instruction stream that a CPU or accelerator can run qualifies as software. Conversely, a file that merely stores data, even if it is bundled with an application, does not truly meet this definition.
In the purest sense, a trained small or large language model is typically distributed as a file with extensions such as .safetensors, .gguf, or .pth. Inside are large multidimensional arrays of numbers—weights and biases learned during training. These numbers parameterize a fixed mathematical function: they tell a neuron how strongly to influence another, how to weight a feature, and how signals propagate through layers.
Crucially, the model file contains no control flow. There are no conditionals, loops, or instructions that say “if X then Y.” It is not an algorithm; it is a parameterization of an algorithm that lives elsewhere. Formats like safetensors are deliberately designed to store only raw data and metadata, explicitly forbidding embedded executable code to prevent remote‑code‑execution attacks. This design choice underscores the fact that models are intended to be inert data, not executable artifacts.
Let's ask whether a model can be executed directly. A CPU cannot interpret a .gguf file; a GPU cannot run it without a driver; you cannot make the file executable (chmod +x) and launch it. To produce output, the model must be loaded into an inference engine—software written in C++, Python, Rust, etc. that knows the model’s architecture, performs tensor operations, schedules work, and handles memory.
All the logic that multiplies matrices, applies activation functions, and manages caches lives in this runtime, not in the model. The same model file can behave in dramatically different ways depending on the runtime, hardware, precision, or quantization scheme used. This dependency draws a clear line between data (the model) and the software (the inference engine) running that model.
Neural networks blur the classic boundary between data and code. In conventional programs, behavior is encoded explicitly in conditionals and loops. In a neural net, behavior is encoded implicitly in numerical weights: tweaking millions of numbers can change the system’s output in the same way software output can be changed by rewriting thousands of lines of code.
Nevertheless, the weights describe what values to use, not how to compute them. The algorithm—the “how”—is fixed and external; the weights are merely coefficients inside that algorithm. That is why two different runtimes can load the same model and still produce identical results while employing completely different execution strategies. The software determines the execution; the model supplies the parameters.
Much of the association of these models as being software stems from one of their most commonly used applications - as developer‑assistant tools like Claude Opus, Qwen or Copilot. But generating source code is just one application of a general‑purpose statistical model. Whether a model writes Python, translates languages, predicts protein structures, or classifies images does not alter its internal structure. A model that outputs code is no more “software” than a CSV file that contains code snippets.
Take a model file and compute its checksum—leave every byte untouched. Now change only the surrounding stack: swap PyTorch for llama.cpp, move from CUDA to CPU, quantize from fp32 to int4, or switch from AVX2 to AVX‑512. The model remains identical, yet latency, memory usage, and even numerical results can vary by orders of magnitude. The only thing that changed is the executable logic, confirming that the model itself is NOT software.
In practice, models are versioned, distributed, cached, deployed, and rolled back just like any other software component. They live in repositories, have compatibility constraints, and are monitored for regressions. But they are not software.
An AI model, in isolation, is data—a trained numerical artifact that encodes the parameters of a mathematical function. It contains no executable logic, control flow, or instructions. Only when an inference engine (software) interprets those numbers does the model become part of a software system.
This distinction matters for correctness, security, auditing, and formal reasoning. It reminds us that modern AI does not replace algorithms with magic; it replaces hand‑written rules with learned parameters that are still evaluated by traditional code.
So, having come up with the question myself, I came to the conclusion that no, an SLM or LLM is not software; it is a trained set of numbers that becomes part of a software system only when interpreted by executable code.
So thanks to Jess and company for letting me post a non-software article here! Hope you found this interesting.
Ben Santora - January 2026
Top comments (0)