<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jacob_D</title>
    <description>The latest articles on DEV Community by Jacob_D (@jacob_d).</description>
    <link>https://dev.to/jacob_d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1122158%2F1924ebe4-30e7-4f9c-82e8-3c139440475c.jpg</url>
      <title>DEV Community: Jacob_D</title>
      <link>https://dev.to/jacob_d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jacob_d"/>
    <language>en</language>
    <item>
      <title>An Intro to Large Language Models and the Transformer Architecture: Talking to a calculator</title>
      <dc:creator>Jacob_D</dc:creator>
      <pubDate>Sun, 14 Dec 2025 07:20:24 +0000</pubDate>
      <link>https://dev.to/jacob_d/an-intro-to-large-language-models-and-the-transformer-architecture-talking-to-a-calculator-483j</link>
      <guid>https://dev.to/jacob_d/an-intro-to-large-language-models-and-the-transformer-architecture-talking-to-a-calculator-483j</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;All models are wrong, but some are useful.&lt;br&gt;
&lt;strong&gt;— George E. P. Box&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I want to tell you a story — or better yet, two stories.&lt;br&gt;
One is about the support I received while feeling down; the other relates to limitations in understanding of Databricks. Both are connected to the GPT model.&lt;br&gt;
Not so long ago, I went through a period in my life when I needed to restructure things. I started reading books and searching for information on YouTube, but that wasn’t enough. So I decided to talk to ChatGPT, which made me feel genuinely listened to by suggesting further reading and viewing. Just as I would with a person, I corrected it when it was wrong, allowing it to improve and guide me toward areas of self-development I hadn’t even dreamed about. In this case, the model was very useful.&lt;br&gt;
Around the same time, I faced another challenge: I needed an easy way to compare two large DataFrames in Databricks. I asked the GPT model — dressed fancily in its Copilot attire — how to do it, and it suggested simply subtracting them. This did reveal differences, but it didn’t pinpoint what those differences were. The DataFrames I work with not only have lots of rows but usually many columns as well. I copied the differing rows into Excel to compare data types and values there, but that still didn’t help. In the end, only good old Stack Overflow could lend me a helping — though, as often, not the most welcoming — hand (I even wanted to link this issue here, but it got closed and removed in the meantime xD Ah, the culture of that place!).&lt;br&gt;
Why was GPT so helpful in the first situation, but not in the second? I believe the answer to this question lies in how the transformer architecture works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Large Language Model?
&lt;/h2&gt;

&lt;p&gt;Despite their impressive capabilities, large language models are essentially structured sets of numerical parameters — often billions — whose interactions are governed by a configurable neural network architecture. These parameters are arranged into matrices and vectors that emerge from the training process. Through exposure to enormous datasets, the model learns statistical relationships between tokens, building an internal representation of language.&lt;br&gt;
In the words of Andrej Karpathy,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A Large Language Model is simply a fancy autocomplete.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This humorous simplification highlights that, at a high level, LLMs simply predict the next piece of text. They do not reason; they do not think. Instead, they process vast amounts of numbers to simulate human-like texting. The largest and most capable models, those able to work through complex, PhD-level mathematical tasks, often contain tens or even hundreds of billions of unquantized parameters; however, due to their cost, they are not — and will not be in the near future — broadly used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inside the Transformer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fke0lb5ktbisgylnvf8uk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fke0lb5ktbisgylnvf8uk.jpg" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;br&gt;
*Image borrowed from: &lt;a href="https://www.mygreatlearning.com/blog/understanding-transformer-architecture/" rel="noopener noreferrer"&gt;https://www.mygreatlearning.com/blog/understanding-transformer-architecture/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The true engine of modern LLMs is the transformer architecture. Depending on the specific model family, transformers may use decoder-only layers (as in GPT models), encoder-only layers, or a combination of encoder–decoder stacks.&lt;br&gt;
As embedding vectors flow through these transformer blocks, they undergo numerous learned transformations. The model uses mechanisms such as self-attention to evaluate how each token relates to all others in context. In doing so, it develops a deep representation of meaning, structure, and intent.&lt;br&gt;
Layer by layer, the vectors become more abstract and refined, capturing multiple semantic layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokenization: Breaking Language Down
&lt;/h2&gt;

&lt;p&gt;Before text enters a model, it must be converted into a format the model can understand. The text is split into chunks called tokens, which may represent characters, syllables, words, or subwords.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqa9u6fqhd21w6xwqsn34.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqa9u6fqhd21w6xwqsn34.png" alt=" " width="143" height="33"&gt;&lt;/a&gt;&lt;br&gt;
*Real-life prompt tokenized&lt;/p&gt;

&lt;p&gt;Among various tokenization strategies, Byte-Pair Encoding (BPE) and its modern variants have become especially widespread, thanks to their efficiency and strong performance in popular models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding: Moving From Words to Numbers
&lt;/h2&gt;

&lt;p&gt;Once tokenized, each discrete token is mapped to a continuous numerical vector through a process called embedding. These embeddings allow the model to work in a high-dimensional numerical space where patterns, relationships, and semantic meaning can be encoded. Tokens are processed individually and collectively, generating dense vector representations that form the basis for all later reasoning inside the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Output and Unembedding
&lt;/h2&gt;

&lt;p&gt;Finally, these refined vectors are passed through an unembedding (or output projection) layer. Here, the model converts its internal numerical representation into tokens. The tokens form the words and sentences that appear in the model’s output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cj1kuuduh88gplcagkk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cj1kuuduh88gplcagkk.png" alt=" " width="263" height="81"&gt;&lt;/a&gt;&lt;br&gt;
*Real-life response tokenized&lt;/p&gt;

&lt;h2&gt;
  
  
  A Technology that Drives Technology
&lt;/h2&gt;

&lt;p&gt;Large language models do not understand the world in the same way humans do. This limitation leads to what are commonly referred to as hallucinations. However, as many of you have witnessed, they can still be quite useful in certain contexts. As Professor Aleksander Mądry said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI is not just any technology; it is a technology that accelerates other technologies and science. It serves as a highway to faster progress. Ignoring it wouldn’t be wise.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Understanding how LLMs and transformers work is essential for making informed decisions about when and how to use them effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outro
&lt;/h2&gt;

&lt;p&gt;As you can see, the model doesn’t perceive text — or its semantics — the way we do. It breaks words into smaller chunks and converts these chunks into numerical representations. It then performs transformations on these vectors and, based primarily on the results of these transformations, generates an output.&lt;br&gt;
As I mentioned earlier, widely accessible models are often quantized, which essentially means they are less precise than they could be, in order to make them more affordable. This is why, in my experience, the GPT model was quite effective in helping me with a softer self-development task, but it struggled with the one that required detailed knowledge, where it couldn’t provide partial answers and be guided by me toward a final solution.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Shroomcast II: Class-based views and structure</title>
      <dc:creator>Jacob_D</dc:creator>
      <pubDate>Wed, 09 Aug 2023 14:49:38 +0000</pubDate>
      <link>https://dev.to/jacob_d/shroomcast-ii-class-based-views-and-structure-1hjh</link>
      <guid>https://dev.to/jacob_d/shroomcast-ii-class-based-views-and-structure-1hjh</guid>
      <description>&lt;p&gt;Shroomcast is an app that I intented to use to learn Django Class-based views, as I indicated in my first article, which &lt;a href="https://www.linkedin.com/posts/jjdabrowski_python-django-activity-7084164705023401984-M8SI" rel="noopener noreferrer"&gt;I published on LinkedIn only&lt;/a&gt;.&lt;br&gt;
What are Class-based views, though?&lt;/p&gt;
&lt;h2&gt;
  
  
  View types in Django
&lt;/h2&gt;

&lt;p&gt;Django allows to create views in two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Original Function-based views&lt;/li&gt;
&lt;li&gt;More advanced Class-based views&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Function-based views
&lt;/h2&gt;

&lt;p&gt;Function-based view is a type of view present in Django since it’s conception. It’s fairly simple, especially with Python’s easy to understand syntax. Function-based views will be the first views you create if you wish to learn Django.&lt;br&gt;
They mainly consist of function definition, followed by its body, where you can manipulate the data before rendering the outcomes in the template. You typically return a template and some data (known as context in Django) from a function-based view, but you may also send back an http response object.&lt;br&gt;
Here's probably the simplest example of function-based view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from django.http import HttpResponse

def index(request):
    return HttpResponse("Hey, I won’t display weather, but I can pass a message!")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here’s a bit more complicated one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def index(request):
    context = {“msg”: “I could display weather, but this guy wanted to use CBVs, so I won’t. At least not in this app…”}
    return render(request, “index.html”, context)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both of these views will display a message (but the second one on top of being more complex, will need some more stuff to work properly than the first one). The first approach to a function-based view allows to display a message only. The second one, however, allows for many data transformations before returning anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Class-based views
&lt;/h2&gt;

&lt;p&gt;Class-based views allow you to quickly create basic web-views such as a plain index page, or a page with a form, or a paginated list.&lt;br&gt;
They are more challenging to implement than function-based views and fall short if it comes to building more complex views, like for example ones that call several endpoints simultaneously and then parse the data received. If you want to create such a complicated view, you should switch back to function-based views.&lt;br&gt;
The function-based view realizing several complex tasks will be hellishly long, but unlike class-based view it won’t employ predefined, abstract methods and attributes. Sometime creating a view that realizes several complex tasks as a CBV might be even almost impossible (been there, tried that).&lt;br&gt;
If you want to learn more about Class-based views, here are two great sources:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://docs.djangoproject.com/en/4.2/topics/class-based-views/" rel="noopener noreferrer"&gt;https://docs.djangoproject.com/en/4.2/topics/class-based-views/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=GxA2I-n8NR8&amp;amp;list=PLOLrQ9Pn6caxNb9eFZJ6LfY29nZkKmmXT" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=GxA2I-n8NR8&amp;amp;list=PLOLrQ9Pn6caxNb9eFZJ6LfY29nZkKmmXT&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Shroomcast’s structure
&lt;/h2&gt;

&lt;p&gt;Shroomcast wasn't meant to be a complicated app, so CBVs were perfect for it. I needed the app to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tell the weather&lt;/li&gt;
&lt;li&gt;Store Visual Crossing Key and utilize it when making API calls&lt;/li&gt;
&lt;li&gt;Let the user know how many query points they have used and how many are still available&lt;/li&gt;
&lt;li&gt;Since all this new information might be overwhelming for a new user, an informative About section was necessary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I chose a form view to display the weather since you can handle forms with it while not creating objects in your database. To accept the Visual Crossing key and then store it, I chose CreateView, which is similar to FormView but creates objects in a database. For the view that would inform the user about the query points they've used so far, I decided on the simple TemplateView. The same went for the "about" page which is a so-called "wall of text".&lt;br&gt;
The user should be routed from the first page depending on whether they have previously provided the system with the Visual Crossing key or not, therefore I designed IndexView as a RedirectView.&lt;/p&gt;
&lt;h2&gt;
  
  
  A challenge right at the beginning of the project
&lt;/h2&gt;

&lt;p&gt;Django views need templates to show anything to user, and after I have created a very first view, Django couldn’t “see” the template associated with this view. I went to the documentation, but there you have them asking you to make paths like this:&lt;br&gt;
&lt;code&gt;app_name/templates/app_name/template_name.html&lt;/code&gt;&lt;br&gt;
From my point of view (as a person who've worked with words for more than 10 years), repeating the name in the route looks bad (yes, I prefer using clean code principles). So I searched for a way to make a simple folder structure for the templates, like:&lt;br&gt;
&lt;code&gt;app_core/templates/template_name.html&lt;/code&gt;&lt;br&gt;
and somewhere else in the docs I have found that if one wants to put the templates in a custom directory, they should put the path to it in the DIRS list in the TEMPLATES constant in the settings.py file, like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [BASE_DIR / 'templates'],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                (…),
            ],
        },
    },
]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And so I have found a way to have the templates on the path that I wanted, but I still wonder, if there isn't a more elegant method. If any Django experts happen to read this article, please share how you would approach this problem.&lt;br&gt;
I also wonder, whether the clean code approach wasn't my enemy in this case, and whether I shouldn't have stayed with the default way that documentation suggests 🤔&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Shroomcast uses more elaborate but faster to implement Django class-based Views. Five views of the following types - TemplateView, RedirectView, FormView and CreateView - comprise the project features.&lt;br&gt;
If you’d like me to delve deeper into any of the topics I brought up in this article, please let me know and see you in the next article in a while 😊&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>python</category>
      <category>django</category>
    </item>
  </channel>
</rss>
