DEV Community: Alessandro De Simone

63 Q&As from Watching Karpathy's LLM Tutorial Twice

Alessandro De Simone — Tue, 03 Feb 2026 18:55:21 +0000

The video "Deep Dive into LLMs like ChatGPT" by Andrej Karpathy (3.5 hours) is one of the most insightful tutorials on Large Language Models.

I learned a lot about LLMs by watching and studying it.

I watched it twice. The first time, I paid attention but didn’t try to understand everything.

The second time was a much slower process. I paused the video every time Andrej explained a concept worth remembering. Each time, I wrote a question and an answer.

I tried to reuse Andrej’s explanations as much as possible, but sometimes they were too verbose, so I had to condense them into a few lines. This was an incredible learning exercise, though not a quick one.

By the end of the video, I had written 63 Q&As, which I polished using ChatGPT, but only to fix grammar and spelling.

If you’ve watched Deep Dive into LLMs like ChatGPT (and you should), use these Q&As to check what you’ve learned about LLMs.

Pre-Training

1. What are the three stages to train a Large Language Model (LLM) like ChatGPT?

Pre-training: Learning general language patterns from large amounts of text

Post-training: Supervised Fine-Tuning (SFT)

RLHF: Reinforcement Learning from Human Feedback

2. What is the primary source of data used to pre-train LLMs?

The primary source of data is text scraped from the web.

Common Crawl is one of the major sources of data crawled from the web.

Other sources include books, academic papers, and articles.

3. What is Common Crawl?

Common Crawl is a nonprofit organization that regularly crawls the web and makes petabytes of web data freely available to the public.

4. Is raw web-scraped data suitable for training as it is?

No, the raw data must be filtered in many ways.

Raw data is noisy and full of duplicate content, low-quality text, and irrelevant information. Before training, it requires heavy filtering.

5. What kinds of filters and cleaning must be applied to raw data for LLM training?

Step 1: URL filtering.
This involves filtering out URLs and domains we do not want in our dataset. This includes malware, pornographic content, racist material, and more.

Step 2: Text extraction.
Web pages extracted by crawlers are in raw HTML. This step removes HTML tags, scripts, and CSS.

Step 3: Language filtering.
Select only pages that correspond to a specific language. If we are not interested in creating a model that can chat in Italian, we can filter out all Italian pages.

Other steps: There are various minor steps. One worth mentioning is PII (Personally Identifiable Information) removal.

6. What is tokenization, and why is it a critical step in training LLMs?

It is about changing the representation of text.

We want to represent text as sequences of symbols, and the neural network is trained on those sequences.

7. Why is tokenizing text into sequences using only a few symbols a bad idea?

The sequence length a neural network can process is a very finite and precious resource, and we do not want long sequences made of very few symbols.

A vocabulary size of just two symbols (0 and 1), or even 256 symbols, is too small.

In production language models, we must go beyond 256 symbols.

This is done by running what is called the Byte Pair Encoding (BPE) algorithm.

8. How does the Byte Pair Encoding algorithm work?

It works by looking for consecutive bytes that occur very frequently.

For example, if the sequence 116 followed by 32 occurs often, we group this pair into a new symbol with ID 256 and replace every occurrence of the pair 116–32 with this new symbol.

We then iterate this algorithm as many times as we wish. Each time we mint a new symbol, the sequence length decreases and the vocabulary size increases.

This process of converting raw text into these symbols (usually called tokens) is called tokenization.

9. What is an LLM’s vocabulary size, and why does it matter?

The vocabulary size is the total number of possible tokens.

If the vocabulary is too small, the sequence representing a text becomes enormous.

Shorter sequences are preferable, but not too short, as that would lead to an overly large vocabulary.

A good vocabulary size turns out to be around 100,000 possible symbols. For example, GPT-4 uses 100,277 tokens.

10. What is TikTokenizer?

It is a helpful web application that shows how a text is tokenized.

11. What does Andrej K. mean by "windows of tokens"?

They are random sequences of tokens extracted from a large corpus of text.

12. What is a good size for token windows?

Andrej says that 8,000 tokens is a good maximum length, and the minimum size is 0.

This means sequences can be anywhere between 0 and 8,000 tokens long.

According to him, 4,000 or 16,000 tokens work fine as the maximum length too.

13. What is the neural network of an LLM trained for?

It is trained to predict the next token in a sequence of tokens.

The goal is to train the model to learn the statistical relationships that describe how tokens follow one another.

14. What are the input and output of the neural network?

The input is a sequence of tokens, and the output is a prediction of what comes next.

Since the vocabulary contains around 100,000 possible tokens, the neural network produces exactly that many numbers. Each number represents the probability of a token being the next one in the sequence.

In short, it is making probabilistic guesses about what comes next.

15. How does pre-training work?

Pre-training is about computing all the parameters and weights of the neural network by feeding it random sequences extracted from the data and adjusting the weights based on the expected next tokens.

Given the huge amount of data involved, pre-training a model can take months and cost hundreds of millions of dollars.

16. Why is LLM output described as stochastic?

Because the output can change each time you run inference on the same input sequence.

The model does not repeat verbatim what it was trained on. Instead it produces responses based on probabilities.

17. What does inference refer to?

Inference is the process of using a trained model to predict the next tokens for a given prompt.

18. What happens when the model is not trained?

An untrained model has randomly initialized weights, so it produces random tokens (non-sensical text).

19. What has driven NVIDIA’s stock price to such a high level?

Pre-training massive models takes months and requires powerful GPUs. Since every major tech company needs these GPUs for their models, demand has surged, pushing NVIDIA's stock price up sharply.

20. What is a base model?

A base model is the result of the pre-training stage (the first stage).

21. How does Andrej K. describe base models?

A base model is a powerful text autocomplete system that creates a remix of the internet.

As Andrej K. said, "It dreams internet pages."

22. What are some web applications for running models like LLaMA 3?

The company Hyperbolic provides a Web App to run models like LLaMA 3 (and many other models): app.hyperbolic.ai

Another good web service is Together.at

23. Can you get useful results from a base model?

Yes you can, but you must prompt the model smartly.

The billions of parameters store lots of knowledge about the world.

You can elicit that knowledge with a prompt that is likely to be found on a web page.

For example:

"Here is my top 10 list of landmarks to see in Paris:"

On the internet, there are many web pages that suggest Paris landmarks, so the recollection of the landmarks will be plausible.

24. Do the parameters store information in a lossless way?

No. The model stores the knowledge from the documents probabilistically, so it is a kind of lossy compression.

When information is recollected via inference, content that appears very frequently on the internet has a higher chance of being remembered correctly compared to more infrequent documents.

So you cannot fully trust the output, since the knowledge is not stored explicitly in the parameters.

It is more a probabilistic recollection of the internet.

25. What is a few-shot prompt?

It is a prompt that contains some examples before asking a question.

The model can infer a task from the examples and apply that task to new inputs.

Example of few-shot prompt:
"butterfly: farfalla, ocean: oceano, whisper: sussurro, mountain: montagna, thunder: fulmine, gentle: gentile, freedom: libertà, umbrella: ombrello, cinnamon: cannella, moonlight: chiar di luna, teacher:"

Thanks to the examples, the model will infer the Italian translation for the word teacher: insegnante.

This capability is called in-context learning.

26. Is it possible to use a base model as an assistant?

Yes, but you must provide a few-shot prompt of a dialog between human and assistant:

human: ...

assistant: ...

human: ...

assistant: ...

human: ...

assistant: ...

...

That said, To create a more reliable assistant, the model must be fine-tuned.

Post-Training

27. What is the goal of post-training?

To create a useful assistant that answers user's questions.

Pre-training gives the user a powerful autocomplete. Post-training turns that into an assistant that actually tries to help the user.

28. What is the data input used to post-train the model?

To train the model to behave like an assistant, we need many thousands of human–assistant conversations.

These conversations are created by humans, often called labelers.

29. Pre-training or post-training: which one is more computationally expensive?

The pre-training stage. It can take months and cost millions of pounds. The major cost comes from renting data centers capable of training on huge amounts of data.

Post-training takes only a few hours, which makes it much cheaper.

30. How do we tokenize conversations into token sequences?

We use the same vocabulary of tokens used in pre-training, plus a few extra special tokens added during post-training.

These special tokens are used to tag the human–assistant conversation.

For example:
<|im_start|>user<|im_sep|>
What is 2 + 2?
<|im_end|>
<|im_start|>assistant<|im_sep|>
2 + 2 equals 4.
<|im_end|>

31. What are three important principles contained in the labeling instructions given to human labelers at OpenAI?

Helpful

Truthful

Harmless

These are only some of the principles contained in the policy manual that labelers need to study to write good answers.

LLM "Psycology"

32. What is the meaning of "hallucination"?

It is when the model does not have enough knowledge stored in its parameters, but it still generates a response, which is a "best guess" (in terms of probability).

Since the guess is not based on actual knowledge, it is often false and sometimes absurd.

33. What are two possible ways to mitigate hallucinations?

These are two techniques:

Post-training with "I don’t know" examples:
A simple technique is to post-train the model on questions for which it does not know the answer, and explicitly teach it to respond with "I don’t know" (or a similar phrase) instead of guessing.

Web search tool:
Another approach mirrors human behavior: searching for information when the answer is unknown.

Modern LLMs can use web search tools to get useful information and add it to the context window. The model then answers the question using this new information, which greatly improves reliability.

Using these techniques, LLM providers have reduced hallucinations in their models.

34. How does Andrej describe the context window and the knowledge in the parameters?

The knowledge in the parameters offers a vague recollection (e.g. of something you read one month ago).

The knowledge in the tokens of the context window is like working memory (e.g. recent experiences that are fresh in our mind).

35. Do LLMs have knowledge of self?

Andrej says that asking questions like "Who are you?" or "Who built you?" is nonsensical.

The model follows the statistical regularities of its training set.

Old models reply to these kinds of questions with plausible but wrong answers (hallucinations).

Newer ones are often trained to answer these questions and avoid hallucinations, but that does not make them self-aware.

36. What is the meaning of "models need tokens to think"?

With that sentence, Andrej states that LLMs don’t think silently; their "thinking" happens by generating tokens step by step.

An LLM is trained to predict the next token, so any reasoning must be expressed as a sequence of tokens.

In the video, he asks the model to solve this math problem:

"Emily buys 3 apples and 2 oranges. Each orange costs $2. The total cost of all the fruit is $13. What is the cost of apples?"

There are two possible answers:

Only the answer:

$3

Answer with reasoning tokens:
2 oranges cost $4
13 − 4 = 9
9 / 3 = 3
For the second answer, the model writes the intermediate steps.

Those steps are reasoning process (the "thinking). The model uses tokens as a form of working memory to reason through the problem.

In this case, the answer is much more likely to be correct.

37. What is a more reliable way to ask ChatGPT to solve math problems?

Just add "Use code" at the end of the question, and the model will generate code that solves the problem (usually Python) and run it to get the response.

38. Are LLMs good at counting? For example, "How many dots are in this string?"

No, LLMs often make mistakes when counting characters or words.

In this case, adding "Use code" to the prompt will request the LLM to write and run Python code. The response is much more reliable, and you can even check the code’s accuracy.

39. Are models good at spelling?

No, because models do not see characters. They see tokens.

For example, if you ask the model to print every third character of a word, the model will probably fail.

If you ask it to "Use code", you will get a correct response.

Reinforcement Learning

40. Andrej uses the school textbook analogy to introduce Reinforcement Learning for LLMs. Can you tell which are the three classes of information in the textbook?

In a textbook, you can find the expositions, the problems and solutions, and the practice problems sections:

The expositions: this is the knowledge base, the explanation of ideas and concepts.

The problems and solutions: these are sections in the book in which the expert shows how to solve specific problems.

The practice problems: these are critical for learning—the problems students can use to practice, and the final answers, usually at the end of each chapter in the textbook, but the steps to get to the answer are not present.

41. How does the textbook analogy map onto an LLM?

The expositions: pre-training stage. The model reads huge amounts of text and learns the statistical correlations between tokens.

The problems & solutions: post-training stage. Supervised fine-tuning, in which the model is trained on thousands of questions (prompts) and ideal solutions and answers provided by human experts.

Practice problems: reinforcement learning.

42. What is a company that publicly shared its Reinforcement Learning approach?

DeepSeek released a paper in which they talked publicly about their approach to RL in their LLMs and the improvements they obtained.

43. In the RL stage, is the model trained using questions and correct answers?

No, and that is the important distinction between Reinforcement Learning and Supervised Fine-Tuning.

The correct answers are not used to train the model.

In the RL stage, the model generates the solutions and the final answers.

The correct answers are used only to check the correctness of the generated answers.

A positive or negative reward is given to the model based on the comparison between the model’s answer and the correct answer.

44. What are models trained with RL usually called?

They are usually called thinking or reasoning models.

45. What is the best use of a thinking model?

To solve problems that require reasoning, like math and coding.

46. In which cases is it overkill to use a thinking model?

For factual questions, where no reasoning is necessary.

It is wasteful to use a thinking model because it requires more tokens and more computation.

47. Why, in the context of the game Go, does Reinforcement Learning get better results than Supervised Learning?

Supervised learning is based on training a model on matches played by human experts. In this way, the model can be as good as the best players, but it cannot go beyond that.

With RL, the system plays against itself.

It plays millions of matches, and only the winning ones are rewarded.

In this way, human performance is not a limit.

In fact, the AlphaGo RL system by Google trained the model using only RL so well that it won against top Go players like Lee Sedol.

48. What are the kinds of problems that have verifiable domains?

These are problems in which all candidate solutions are easy to score against a correct answer.

The scoring and the reward can be done automatically, without human intervention.

For example, in math problems it is easy to check if the final number is correct.

Logic games like chess and Go are also examples, in which it is possible to verify whether certain moves will end with a win or a loss.

49. What are the kinds of problems that have unverifiable domains?

These are problems where the correctness and quality of the response are subjective and hard to measure.

For example: "Write a joke about pelicans". Machines are bad at understanding humor, so only humans can score this kind of question.

50. What is the meaning of RLHF?

It is Reinforcement Learning from Human Feedback.
RLHF is a form of RL that requires input from humans.

For example, humans rank or compare different answers based on their quality, providing preference data that helps train the model.

51. How does RLHF work in practice for unverifiable tasks?

LLM engineers create and train a separate reward model neural network to imitate human preferences.

This reward model is then used to score responses generated by the LLM, and reinforcement learning is applied to encourage higher-scoring outputs.

Example:

Prompt: "Write a joke about pelicans" (asked 5 times).

The LLM produces five different responses: a, b, c, d, e.

The reward model assigns scores to these responses and ranks them from best to worst, approximating human preferences.

Reinforcement learning then nudges the model to tell jokes more like the higher-ranked ones, which are potentially funnier.

52. What is the discriminator-generator gap?

It is much easier to tell whether something is good than to generate it.

You can often spot a bad explanation immediately, but generating a good explanation is much harder.

That asymmetry is the discriminator–generator gap.

53. Can you run RLHF as long as you want to improve indefinitely an LLM?

No, after a certain number of iterations, usually a few hundred updates, the LLM starts degrading.

The reason is that LLMs start finding answers that trick the reward model and get very high scores for nonsensical responses, in a sort of gamification.

In other words, the reward function is gameable, and LLMs are very good at that game, discovering inputs that are evaluated as excellent, even if they are nonsensical for real humans.

So RLHF works, but in a limited way. You cannot run it for too long.

The solution is to stop RLHF before the model deteriorates.

54. What are adversarial examples in RLHF?

They are nonsensical responses that the LLM learns to generate because they trick the reward model into giving them very high scores.

The model exploits flaws in the reward model to maximize its score rather than actual quality.

55. What is the main difference between RL in a verifiable domain and RLHF in an unverifiable domain?

You can run RL for extended periods in a verifiable domain and still discover better solutions.

The game of Go is a good example in which RL applies well. DeepMind trained a model so well that it eventually beat the best Go player.

RLHF is not the kind of RL that you can run for extended periods. At a certain point, the model starts generating bad responses that are scored highly by the reward model (a problem known as reward model overoptimization).

56. What does Andrej mean by the "Swiss cheese model"?

Andrej uses the Swiss cheese metaphor to describe LLM capabilities.

They work really well for certain things, but they fail in other cases, and they do so almost at random, like the holes in Swiss cheese.

An example of a shortcoming that happened with early models of ChatGPT is:

"What is bigger, 9.11 or 9.9?"

ChatGPT used to answer "9.11", which is of course wrong.

Recent models have fixed this problem.

57. Should you fully trust LLM responses?

No, you should not. Models are not infallible. They can hallucinate and fail in different ways (see the Swiss cheese model), but they are still powerful and useful tools.

Use them for a first draft, for inspiration, to summarize, and for many other tasks, but do not fully trust them.

Be responsible for the work you create using LLMs.

58. What is a multimodal model?

It is a model that can process not only text, but also audio, images, and video.

Those different media can be tokenized in a similar way to text, so multimodal models are not technically very different from text-only LLMs.

59. What are LLM agents?

Agents are systems built around LLMs that use tools to perform tasks and report progress to humans.

They can run for minutes or hours to complete longer jobs. Since models are not infallible, they benefit from human supervision, especially for critical tasks.

60. What is the biggest limitation of LLMs regarding learning?

The capacity to learn new things.

LLMs ingest all their knowledge during the pre-training and post-training stages. After that, the models do not have the capacity to change their parameters, which means they cannot learn new things.

You can use in-context learning and give the model examples in the prompt (aka few-shot prompting), but this is not real learning since the parameters do not change.

Also, the context window is a finite and precious resource, especially when running multimodal tasks, so its use is limited.

This is an open issue, and there is currently a lot of research to address it.

61. What is LMArena?

LMArena (also known as Chatbot Arena) is an LLM leaderboard that ranks top models based on human comparisons.

Two models are shown the same prompt, and humans compare their responses without knowing which model produced which answer.

62. In what way is the model DeepSeek-R1 different from Gemini or ChatGPT?

DeepSeek-R1 has an MIT license and an open-weights release, so anyone can download and use it and freely host their own version of DeepSeek.

On the contrary, Gemini, ChatGPT (or Claude) have proprietary licenses.

It was surprising that a model as powerful as DeepSeek-R1 was released with open weights. Hopefully, more companies will follow DeepSeek's example.

63. What is LM Studio?

LM Studio is an application to run LLMs on your computer.

You probably cannot run top models locally, like DeepSeek-V3 with 671B parameters (you'd need hundreds of gigabytes of RAM and powerful GPUs), but fortunately there are smaller versions available, such as distilled or quantized models.

You can run these smaller models on a powerful MacBook Pro or Linux box (64–128 GB RAM). To run models more easily, you can use lower precision (quantization).

Conclusion

LLMs are formidable tools, and the time you spend learning how to leverage them is totally worth it.

I created these Q&As as personal notes, but I hope you found them inspiring and helpful.

If you tried to answer the questions before revealing the answers, congratulations! You've just strengthened the neuron connections about LLMs in your brain! (Yes, that's actually how learning works!)

Why I stopped studying for the AWS Certification

Alessandro De Simone — Sat, 26 Jul 2025 08:37:39 +0000

I was falling into the same trap again.

So I wrote this as a reminder to myself and to anyone considering a certification just for the sake of being "certified".

Why I started the AWS Certification

I had started studying for the AWS Developer Associate certification.

As a backend engineer, I often work with AWS services, so improving my AWS skills made sense. It seemed like a smart career move—and a nice addition to my CV.

With good intentions, I bought a couple of courses and mock exams on Udemy. I was ready to commit.

Why I stopped pursuing it

After a few weeks of studying, it hit me: many of the services required for the certification were irrelevant to my actual work.

Take AWS CloudFormation, for example. It's their proprietary Infrastructure as Code (IaC) tool, and it's mandatory for the exam.
But I had no real interest in it.

Why? Because most companies I've worked with, like the Legal Aid Agency (LAA), where I'm currently contracting, use Terraform (an open-source IaC tool). That's what I actually need to know. Not CloudFormation.

The same logic applied to other services:

CloudWatch is AWS's monitoring tool, but I'm more interested in Prometheus and Grafana, which are more widely adopted in modern DevOps stacks.
CodePipeline and CodeBuild are AWS's CI/CD tools, but most teams I work with use GitHub Actions or CircleCI.

I realized I was spending time and energy learning tools I might never use—just to pass an exam.

My GCP Certification Experience

This wasn't the first time.

A few years ago, I spent months preparing for the Google Cloud Professional Data Engineer certification.
I learned about BigQuery, BigTable, and several other Big Data tools.

It was tough. I had zero experience with Big Data, and I failed the exam twice before finally passing.

And you know what? I've never used that knowledge in a real project.

What Matters More Than Certification

That experience should have taught me something: learning should be driven by what you actually use and not what a test requires.

Certifications can be valuable if they are strictly required by the company you want to work for (Hey, do you really want to work for them?).

So I decided to abandon the AWS cert. If I need to learn a specific AWS service for work, I'll learn it—with the goal of applying it immediately.

Time is limited. Focus on learning what makes you better at your job.

Luckily, most companies these days — aside from a few exceptions — care more about real skills than just having a certificate.

And that's great!

It means you get to decide what's worth learning to grow as a developer.

The Art of Reinventing the wheel in Software Development

Alessandro De Simone — Tue, 25 Feb 2025 21:08:21 +0000

Who said that you should not reinvent the wheel?

I wrote my website alessandro.desi from scratch using Python and the Flask framework.

I actually tried WordPress for a while, but I was not happy with it. Too big and complex. I wanted something simple, and customize on my needs.

I am sure I could have found a simpler alternative, but I didn’t want to learn another blog system.

I was pretty good with Ruby and Ruby on Rails, but I wanted to learn Python to widen my job opportunities.

So I took it as a chance to learn a new language and framework. This website was my first project coded with Python/Flask.

My experience with Ruby helped, but still, it took days of my free time to build it. It was much slower than expected, but I was keen, and I knew that practicing Python is a valuable skill.

Python/Flask was not the only thing I practiced:

I wrote the pages in HTML and styled it with CSS.
I recently added a subscription form to the website to start creating an email list. So I learned about the Flask-SQLAlchemy package to save the data in a SQLite database and Flask-WTF to handle the HTML form and field validations.
I also set up a VPS (Virtual Private Server) with Linux, installed a web server, and deployed the project.

When you decide to reinvent the wheel, there are so many overlooked details that you need to figure out.

All of this probably took 30 times longer than just installing WordPress.

Was it worth it?

Yes, for me, it was.
My work is coding, and I wanted to expand my skills while building something I needed.

Should I always use this approach?
Of course not!

There are many reasons to reuse existing standard and popular wheels:

Allows you to work smoothly as part of a team. Front-end programmers often use React, Angular, or Vue, and it’s easy to find a new developer who knows those JavaScript frameworks.
Allows you to focus on the value proposition of your project rather than rebuilding basic functionalities.
Allows you to rely on a wide community that has extensively tested the project and can help resolve issues quickly.

In short, most of the time, you should go with the safest choice: rely on an established solution!

Anyway, I am always amazed by the stories of unconventional coders (and not only), people who are not happy with the status quo.

Many of the most influential tools we use today exist because someone reinvented a widely used solutions.
Here are some of my favorite examples of rethought projects:

Ruby on Rails - Reinventing the Web Framework

Rails is the web framework that allowed me to build my career in the UK when it was the first choice for startups.
Many frameworks existed before Ruby on Rails. But David Heinemeier Hansson (aka DHH) decided to create his own framework while building Basecamp, focusing on developer happiness and convention over configuration. This "reinvention" revolutionized web development and influenced countless frameworks that came after.

Hanami - (Re)Reinventing the Web Framework

DHH released Rails in 2004, and after a few years, Rails became the dominant web framework.
But despite that, Luca Guidi saw an opportunity for a different approach. He wanted to create a framework that better followed Object-Oriented Programming (OOP) practices and provided better separation of concerns.
So, in 2014, Luca began developing Hanami (originally called Lotus).
Hanami introduced a modular approach where each part of the application could be a separate micro-application.
This wasn’t just different for the sake of being different—it was an architectural choice aimed at making applications easier to test and maintain over time.

Redis - (Re)Inventing the Key-Value Storage

In 2009, Salvatore Sanfilippo was working on a real-time web analytics system for his startup. He needed a way to collect and analyze web page view data as it happened, but the existing databases were too slow. Also, the existing key-value stores didn’t provide the data structures he needed for analytics calculations.

In this case, Salvatore wasn’t just reinventing the wheel—he was optimizing and evolving it for a specific use case.

Sanfilippo decided to create something new that would perfectly fit his use case. He wanted a database that could handle high-speed operations in memory, support rich data structures beyond simple key-value pairs, and maintain data persistence when needed.

This led to the birth of Redis (REmote DIctionary Server).
Today, Redis is used by countless companies. It is used in most of the companies I’ve worked at, even at the Ministry of Justice (UK), where I am working now.

I am sure you noticed I am totally biased with my stories. The first concerns Rails, which I’ve used for many years. Hanami and Redis were created by Italian programmers. :)

Conclusion

Even though the projects I mentioned are monstrously complex, you can start by recreating simpler things.

Here are some ideas for programmers who want to understand how things work under the hood:

Implement a few algorithms. Developers don’t need to write sorting, search, or graph algorithms anymore, but they are exciting to code! Start easy, choose a sorting algorithm like Bubble Sort or Quick Sort, for example. Then continue with more complex ones, like Shortest Path in a Graph (Dijkstra’s).
Create your version of Tetris, Pac-Man, or any simple platform game to practice the language you are learning. It is amazing how many useful concepts you can learn by creating simple games.
Build your own web server from scratch. You’ll learn networking concepts like sockets, HTTP, and TCP.
Build an elastic collision engine or any physics simulation. A few years ago, while learning Golang, I wrote a two sphere collision simulator in Golang. It was fun and educational.
Build a simple 3D wireframe engine. This is great for refreshing your knowledge of matrix algebra.
Build a simple neural network from scratch. This is something I’d like to try. First, understand well what a neural network is, and then approach the code with a good tutorial. Just google "Neural Network from scratch Python."

Happy re-invention! And thanks for reading.

What's in your SW Developer toolbox?

Alessandro De Simone — Sat, 16 Jul 2022 20:53:32 +0000

Let's assume you know a programming language and are proficient with it.

Is it enough to be a professional SW developer?

No, it is not.

The programming language is your primary tool. It is the brush of the painter and the hammer of the carpenter.

But you'd need to master other tools to collaborate with other developers and contribute to complex applications.

In Italy, there is a saying which goes like this:
"I ferri fanno il mastro"
which means "the tools make the master craftsman".

If you take it literally, it is not very accurate. You also need the experience to use the tools.

In this context, "having the tool" means being able to recognise when it's good to use a specific technology and to be able to solve the problem using it.

What should be in the toolbox of a developer?
It depends. Web developers, mobile developers, and data engineers (and so on) use different tools.
For example, knowing about ELT process might not matter for a Web developer, but it's crucial for a data engineer. On the other hand, Html and CSS are essential for a web developer.

If you are a Ruby developer, at the very core, you must know:
Ruby on Rails, RSpec, a Ruby version manager (RVM, asdf, ...)
and to fill a senior position, be sure to put a bunch of gems, a good knowledge of OOP, and SQL in your toolbox.

There is no limit on the tools you can have, but it requires time to acquire them, and that's your limit.

Choose carefully the tools you want to put in your toolbox based on the career you want.

I've spent many months studying Data and ML to get the GCP Data Engineering professional certification.

After a while, I realised that it was irrelevant to my career. I had a vague idea of moving toward a career in data engineering, but my approach was wrong.

A better way would have been to learn Python and ML libraries and start an actual project using them.

The GCP Data certification does not add much to my employability. It's just not required for a Ruby Engineer.
What's in high demand - for my profile like mine - are DevOps and CI/CD skills, for example.

After a few errors, I've developed a few ways to choose a new subject to learn. These are:

Notice what slows you down in your daily job

Sometimes, you are already using a tool, but your knowledge is shallow, and you spend lots of time on Stack Overflow to find responses, which you'll forget in a few days.
It might be Git or SQL, for example. Find the time to master the tools that you use often.

Analyse the typical job specs

Visit an online job board, and insert your job description in the search box.
Then, open the results relevant to the position you are interested in.

Your goal is to find the IT skills that are most present in the various job spec (or, in other words, the keywords with the highest frequency).

Those are the skills that you should consider learning

Website showing the co-occurring IT Skills

Similarly, some websites allow you to find skills related to a particular skill.

The website ItJobsWatch calls them "Co-occurring IT Skills", and it finds them processing thousands of job specs.

It is similar to the previous method but applied at scale.

For example, the following link shows skills most related to Ruby On Rails:
Ruby on Rails - Related skills

and they are:

Javascript
Docker
Git
SQL
React

Those are indeed the most required skills for a Ruby On Rails developer.

(Surprising, the list also shows Java, which is not correct, in my opinion)

Find out common skills

The last technique is analysing a broader set of job descriptions and discovering what are the skills in common.

For example, if you search for "Back End Developer", regardless of the programming language, you'll find the following skills (among others):

OOP (Object Oriented Programming)
Testing
Docker
Git
SQL

Those skills are fundamental because they will remain relevant in your career, even if you switch to a different programming language.

How to make peace with Ruby on Rails and code happily

Alessandro De Simone — Sun, 04 Jul 2021 11:57:19 +0000

This post sums up few ideas from various articles and videos which helped me to realize why most of the big Ruby on Rails applications are hard to maintain.

I've been working on small Ruby on Rails apps, gigantic monoliths and tangled microservices for about 10 years.

Ruby on Rails takes full advantage of the flexibility of the Ruby language.

Matz (Yukihiro Matsumoto) built the language to make life easy for developers, and even make them happy.

In fact, Ruby's motto is A Programmer's Best Friend.

I code in Ruby every day, and I have to admit, I rarely feel "happiness".

Sometimes I am pleased with my solution, but most of the time I strive to gain an understanding of what the code is doing.

Why does it happen?

Shouldn't I feel joy while coding in a perfect status of flow?

Let's start with a basic principle:

to add or change functionalities to an application, developers needs to understand its code.

Readable code is easy to understand and easy to change.

Code is read many more times than it is written, so investing time to write readable code pays off, even in the short term. The less bad code smells in the code, the better.

There are many books about writing good code, two good ones are:

Practical Object-Oriented Design, by Sandi Metz.

This is my favourite book about writing maintainable Object-Oriented applications in Ruby.
Refactoring: Ruby Edition, by Jay Fields, Martin Fowler, and Shane Harvie.

This is the Ruby version of the most famous book about bad code smells and how to refactor them.

The issues present in most of the Rails apps I've worked with are linked to the topics of the two books: poor OOP and Bad Smell in code.

These are the most common issues I've seen in many Rails apps:

Fat model
Fat controller
Logic in the view
Lack of a clear design

Fat Model

The responsibility of a model is to persist data in a DB table and manage associations with other models.

It is very easy to add functionalities to them, and common to see models named User, Company or Booking having tens of methods and many hundreds of lines.

Inside those fat models, you can find logic about validations, policies, business and views.

Also, they are often soiled with many callbacks which turns the model into a bowl of spaghetti code.

There are few patterns you can follow to keep models clean. Here is a helpful post that describes 7 Patterns to Refactor Fat ActiveRecord Models

Fat Controller

Fat controllers are even worse than fat models.

The controller is the home of the actions linked to the HTTP calls.

The only duty of a controller action should be "to call an action" and render the result (or redirect to a different page).

Often the controller actions contain logic about:

business rules
database access
presentation logic

When that happens, it makes the actions long, confusing and hard to maintain. Also the tests would be long and messy.

Moving all the business logic to a Service object is a step forward.

The controller action will call the service and then render/redirect with the proper HTTP status based on the service result.

Beware of misusing Service Objects as they do not provide a good abstraction for your Business Domain, they just procedure.

Jason Swett summarised well why using Service Object should not be the rule in Beware of service objects in Rails.

Avdi Grimm suggests that it is fine to write procedure instead of services, especially when the domain concepts are not yet well defined.

He wrote about this in Enough With the Service Objects Already.

Logic in the view

The view should contain as little logic as possible.

Mixing up Ruby code with HTML code makes the views unreadable and hard to test.

Messy Rails views often contain one or more of the following:

Nested conditionals
Calculations
DB queries
Variable assignments

Thus, you need to keep those outside the views.

Two common techniques to help you to keep the views clean are Decorators and the View objects.

Here is an article explaining what they are and why moving the logic to the models is not a solution: Cleaning Up Your Rails Views With View Objects

Lack of a clear design

This is the result of the previous three subjects.

If the business logic pervades the models, controllers and even views, you'll have a hard time understanding what is the business domain of the application. The design is abstract and entangled with the concepts of Ruby on Rails, and business concepts are mixed with those of the framework.

A possible solution is to start up front to adopt design patterns like Clean Architecture or Exagonal Architecture.

Those patterns are not convenient for small Rails apps, but they pay off for big projects.

However, note that if the application eventually succeeds, it will grow and more functionalities will be added by you and your workmates, and if it lacks a clear design, you will have a hard time maintaining it.

At the GoRuCo 2012 conference, Matt Wynne gave a great talk about Hexagonal Rails

Conclusion

Ruby is a flexible and powerful language and it should be a pleasure coding with it.

Often big messy Rails applications (and not just Rails) make your life as a developer really hard.

That should not be the norm, there are various strategies that you can follow to keep your project readable and maintainable, and in fact, there are many successful start-ups based on Rails applications beautifully coded.

I hope you've found this post and its links helpful.

This was initially published on my blog here: How to make peace with Ruby on Rails and code happily