DEV Community: Jacob_D

An Intro to Large Language Models and the Transformer Architecture: Talking to a calculator

Jacob_D — Sun, 14 Dec 2025 07:20:24 +0000

All models are wrong, but some are useful.
— George E. P. Box

I want to tell you a story — or better yet, two stories.
One is about the support I received while feeling down; the other relates to limitations in understanding of Databricks. Both are connected to the GPT model.
Not so long ago, I went through a period in my life when I needed to restructure things. I started reading books and searching for information on YouTube, but that wasn’t enough. So I decided to talk to ChatGPT, which made me feel genuinely listened to by suggesting further reading and viewing. Just as I would with a person, I corrected it when it was wrong, allowing it to improve and guide me toward areas of self-development I hadn’t even dreamed about. In this case, the model was very useful.
Around the same time, I faced another challenge: I needed an easy way to compare two large DataFrames in Databricks. I asked the GPT model — dressed fancily in its Copilot attire — how to do it, and it suggested simply subtracting them. This did reveal differences, but it didn’t pinpoint what those differences were. The DataFrames I work with not only have lots of rows but usually many columns as well. I copied the differing rows into Excel to compare data types and values there, but that still didn’t help. In the end, only good old Stack Overflow could lend me a helping — though, as often, not the most welcoming — hand (I even wanted to link this issue here, but it got closed and removed in the meantime xD Ah, the culture of that place!).
Why was GPT so helpful in the first situation, but not in the second? I believe the answer to this question lies in how the transformer architecture works.

What is a Large Language Model?

Despite their impressive capabilities, large language models are essentially structured sets of numerical parameters — often billions — whose interactions are governed by a configurable neural network architecture. These parameters are arranged into matrices and vectors that emerge from the training process. Through exposure to enormous datasets, the model learns statistical relationships between tokens, building an internal representation of language.
In the words of Andrej Karpathy,

A Large Language Model is simply a fancy autocomplete.

This humorous simplification highlights that, at a high level, LLMs simply predict the next piece of text. They do not reason; they do not think. Instead, they process vast amounts of numbers to simulate human-like texting. The largest and most capable models, those able to work through complex, PhD-level mathematical tasks, often contain tens or even hundreds of billions of unquantized parameters; however, due to their cost, they are not — and will not be in the near future — broadly used.

Inside the Transformer

*Image borrowed from: https://www.mygreatlearning.com/blog/understanding-transformer-architecture/

The true engine of modern LLMs is the transformer architecture. Depending on the specific model family, transformers may use decoder-only layers (as in GPT models), encoder-only layers, or a combination of encoder–decoder stacks.
As embedding vectors flow through these transformer blocks, they undergo numerous learned transformations. The model uses mechanisms such as self-attention to evaluate how each token relates to all others in context. In doing so, it develops a deep representation of meaning, structure, and intent.
Layer by layer, the vectors become more abstract and refined, capturing multiple semantic layers.

Tokenization: Breaking Language Down

Before text enters a model, it must be converted into a format the model can understand. The text is split into chunks called tokens, which may represent characters, syllables, words, or subwords.

*Real-life prompt tokenized

Among various tokenization strategies, Byte-Pair Encoding (BPE) and its modern variants have become especially widespread, thanks to their efficiency and strong performance in popular models.

Embedding: Moving From Words to Numbers

Once tokenized, each discrete token is mapped to a continuous numerical vector through a process called embedding. These embeddings allow the model to work in a high-dimensional numerical space where patterns, relationships, and semantic meaning can be encoded. Tokens are processed individually and collectively, generating dense vector representations that form the basis for all later reasoning inside the model.

Output and Unembedding

Finally, these refined vectors are passed through an unembedding (or output projection) layer. Here, the model converts its internal numerical representation into tokens. The tokens form the words and sentences that appear in the model’s output.

*Real-life response tokenized

A Technology that Drives Technology

Large language models do not understand the world in the same way humans do. This limitation leads to what are commonly referred to as hallucinations. However, as many of you have witnessed, they can still be quite useful in certain contexts. As Professor Aleksander Mądry said:

AI is not just any technology; it is a technology that accelerates other technologies and science. It serves as a highway to faster progress. Ignoring it wouldn’t be wise.

Understanding how LLMs and transformers work is essential for making informed decisions about when and how to use them effectively.

Outro

As you can see, the model doesn’t perceive text — or its semantics — the way we do. It breaks words into smaller chunks and converts these chunks into numerical representations. It then performs transformations on these vectors and, based primarily on the results of these transformations, generates an output.
As I mentioned earlier, widely accessible models are often quantized, which essentially means they are less precise than they could be, in order to make them more affordable. This is why, in my experience, the GPT model was quite effective in helping me with a softer self-development task, but it struggled with the one that required detailed knowledge, where it couldn’t provide partial answers and be guided by me toward a final solution.

Shroomcast II: Class-based views and structure

Jacob_D — Wed, 09 Aug 2023 14:49:38 +0000

Shroomcast is an app that I intented to use to learn Django Class-based views, as I indicated in my first article, which I published on LinkedIn only.
What are Class-based views, though?

View types in Django

Django allows to create views in two ways:

Original Function-based views
More advanced Class-based views

Function-based views

Function-based view is a type of view present in Django since it’s conception. It’s fairly simple, especially with Python’s easy to understand syntax. Function-based views will be the first views you create if you wish to learn Django.
They mainly consist of function definition, followed by its body, where you can manipulate the data before rendering the outcomes in the template. You typically return a template and some data (known as context in Django) from a function-based view, but you may also send back an http response object.
Here's probably the simplest example of function-based view:

from django.http import HttpResponse

def index(request):
    return HttpResponse("Hey, I won’t display weather, but I can pass a message!")

And here’s a bit more complicated one:

def index(request):
    context = {“msg”: “I could display weather, but this guy wanted to use CBVs, so I won’t. At least not in this app…”}
    return render(request, “index.html”, context)

Both of these views will display a message (but the second one on top of being more complex, will need some more stuff to work properly than the first one). The first approach to a function-based view allows to display a message only. The second one, however, allows for many data transformations before returning anything.

Class-based views

Class-based views allow you to quickly create basic web-views such as a plain index page, or a page with a form, or a paginated list.
They are more challenging to implement than function-based views and fall short if it comes to building more complex views, like for example ones that call several endpoints simultaneously and then parse the data received. If you want to create such a complicated view, you should switch back to function-based views.
The function-based view realizing several complex tasks will be hellishly long, but unlike class-based view it won’t employ predefined, abstract methods and attributes. Sometime creating a view that realizes several complex tasks as a CBV might be even almost impossible (been there, tried that).
If you want to learn more about Class-based views, here are two great sources:

Shroomcast’s structure

Shroomcast wasn't meant to be a complicated app, so CBVs were perfect for it. I needed the app to:

Tell the weather
Store Visual Crossing Key and utilize it when making API calls
Let the user know how many query points they have used and how many are still available
Since all this new information might be overwhelming for a new user, an informative About section was necessary

I chose a form view to display the weather since you can handle forms with it while not creating objects in your database. To accept the Visual Crossing key and then store it, I chose CreateView, which is similar to FormView but creates objects in a database. For the view that would inform the user about the query points they've used so far, I decided on the simple TemplateView. The same went for the "about" page which is a so-called "wall of text".
The user should be routed from the first page depending on whether they have previously provided the system with the Visual Crossing key or not, therefore I designed IndexView as a RedirectView.

A challenge right at the beginning of the project

Django views need templates to show anything to user, and after I have created a very first view, Django couldn’t “see” the template associated with this view. I went to the documentation, but there you have them asking you to make paths like this:
app_name/templates/app_name/template_name.html
From my point of view (as a person who've worked with words for more than 10 years), repeating the name in the route looks bad (yes, I prefer using clean code principles). So I searched for a way to make a simple folder structure for the templates, like:
app_core/templates/template_name.html
and somewhere else in the docs I have found that if one wants to put the templates in a custom directory, they should put the path to it in the DIRS list in the TEMPLATES constant in the settings.py file, like so:

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [BASE_DIR / 'templates'],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                (…),
            ],
        },
    },
]

And so I have found a way to have the templates on the path that I wanted, but I still wonder, if there isn't a more elegant method. If any Django experts happen to read this article, please share how you would approach this problem.
I also wonder, whether the clean code approach wasn't my enemy in this case, and whether I shouldn't have stayed with the default way that documentation suggests 🤔

Summary

Shroomcast uses more elaborate but faster to implement Django class-based Views. Five views of the following types - TemplateView, RedirectView, FormView and CreateView - comprise the project features.
If you’d like me to delve deeper into any of the topics I brought up in this article, please let me know and see you in the next article in a while 😊