DEV Community

Cover image for What LLM Contains
Janith Disanayake
Janith Disanayake

Posted on • Edited on

What LLM Contains

Downloading a model from Ollama or similar platforms (like Hugging Face, TensorFlow Hub, etc.) gives you access to more than just a file labeled “model.” These packages contain several components that allow the model to be executed, fine-tuned, or embedded in applications.

Let’s break it down into a clear structure, so you can understand, dissect, and even modify the model effectively.


🔹 1) What You Get When You Download a Model

A typical downloaded model (especially from Ollama, Hugging Face, etc.) consists of the following main components:

✅ A. Model Weights (Parameters)

Usually large binary files (e.g., .bin, .pt, .safetensors, .ckpt).

These contain the actual learned values (neurons, weights, biases) from training.

Cannot run the model without this.

Example:

  • pytorch_model.bin – For PyTorch

  • ggml-model-q4_0.bin – Quantized weights used in Ollama and llama.cpp

✅ B. Model Architecture / Config File

Defines the structure of the model (number of layers, hidden units, attention heads, etc.).

Often in a .json or .yaml file.

Example:

  • config.json – Specifies transformer type, vocabulary size, hidden dimensions, etc.

✅ C. Tokenizer Files

Preprocessing logic that turns text into tokens and vice versa.

Includes vocabulary (vocab.json), merges (merges.txt), or tokenizer config.

Example:

  • tokenizer.json

  • vocab.txt

  • merges.txt

  • tokenizer_config.json

✅ D. Generation Scripts or Runners

Python or binary files that allow you to run the model (with inference loops, prompts, sampling settings, etc.).

Ollama wraps this in a unified runtime using Modelfile and its CLI.

In frameworks like Hugging Face:

  • run_generation.py

  • model.py

✅ E. Quantization Information (Optional)

If you're using a quantized model (like from llama.cpp or ggml), there may be metadata about how the weights are compressed.

Affects performance and memory usage.

✅ F. Prompt Templates / System Instructions (Optional in LLMs)

Ollama models often include template prompts (like system, user, assistant roles).

These guide how prompts are injected before inference.


🔹 2) How You Can Divide and Understand It

Component Purpose Editable? Tool Used
Model Weights Learned knowledge of the model ❌ Not easily editable Python, llama.cpp, Ollama
Config/Architecture Structure of the neural network ✅ Yes Text editor
Tokenizer Converts words into model-readable format ✅ Yes Hugging Face, tiktoken
Prompt Templates Controls the input prompt formatting ✅ Yes Modelfile in Ollama
Quantization Info Enables smaller model sizes and faster runs ✅ Yes (with tools) llama.cpp, ggml

🔹 3) If You're Using Ollama

When you ollama pull llama3, for example, behind the scenes it downloads a model package that contains:

✅ A quantized binary model file (.bin)

✅ A Modelfile (like a Dockerfile for models)

✅ Prompt format templates (like system, user, etc.)

You can run

ollama show llama3

Enter fullscreen mode Exit fullscreen mode

To inspect the components and configuration.


🔹 4) How to Empower Yourself

Here’s how you can gain deeper control:

Goal What to Learn
Fine-tune a model Hugging Face Transformers, PyTorch/TF basics
Quantize for performance ggml, llama.cpp, GPTQ, bitsandbytes
Build custom prompts Prompt engineering, prompt templates in Ollama
Modify architecture Learn model config files (config.json)
Tokenizer tuning or replacement tokenizers library, vocab files

🔹 Summary Diagram

Downloaded Model

    ┌──────────────────────┐
    │   Model Weights      │  (.bin, .pt, .safetensors)
    └──────────────────────┘
    ┌──────────────────────┐
    │ Config / Architecture│  (config.json)
    └──────────────────────┘
    ┌──────────────────────┐
    │ Tokenizer Files      │  (vocab.json, merges.txt)
    └──────────────────────┘
    ┌──────────────────────┐
    │ Prompt Templates     │  (system, user format)
    └──────────────────────┘
    ┌──────────────────────┐
    │ Runner / Wrapper     │  (Modelfile, Python scripts)
    └──────────────────────┘
    ┌──────────────────────┐
    │ Quantization Metadata│  (quant-info.json, ggml meta)
    └──────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Top comments (0)