DEV Community: Javier Aguirre

Deep Learning for Beginners: A Complete Guide

Javier Aguirre — Fri, 29 May 2026 03:31:41 +0000

You've heard the words. Neural networks. Deep learning AI. Transformers. Maybe you've even heard that deep learning is what powers ChatGPT, image generators, voice assistants, and self-driving cars. And now you're wondering: what actually is it?

This guide is the honest, comprehensive answer. Not watered down. Not padded with fluff. Just the core ideas, the key architectures, and the intuition you need to actually understand what's going on under the hood, explained clearly enough that a beginner can follow it, and thoroughly enough that it stays useful as you go deeper.

If you haven't read our post on machine learning basics yet, I'd recommend starting there. Deep learning builds directly on top of machine learning, and a few concepts from that post will make this one click much faster. That said, let's get into it.

What Is Deep Learning?

Deep learning is a branch of machine learning that uses artificial neural networks (systems loosely inspired by the structure of the human brain) to learn patterns from data.

The word "deep" refers to depth: the number of layers stacked inside the network. A shallow network might have one or two layers. A deep network might have dozens, hundreds, or in the case of modern large language models, thousands. Each layer transforms the data slightly, learning increasingly abstract representations as you move through the stack.

Here's the key insight that separates deep learning from classical machine learning: traditional ML algorithms need humans to engineer features manually. You decide what inputs matter. In deep learning, the network learns its own features directly from raw data, pixels, waveforms, text characters, without being told what to look for.

That's what makes it so powerful. And that's what makes it so data-hungry.

"Deep learning is not magic. It's a very large function with millions of parameters, trained on enormous amounts of data, that has learned to approximate patterns no human could write by hand."

What Is Deep Learning vs Machine Learning?

It's a nested relationship, not a competition. All deep learning is machine learning, but not all machine learning is deep learning.

Classical machine learning (decision trees, random forests, gradient boosting) works well on structured, tabular data. It's interpretable, efficient, and still dominant across most real-world business applications.

Deep learning takes over when the data is unstructured: images, audio, video, raw text. These formats have too many dimensions and too much complexity for traditional algorithms to handle well. Neural networks, with their layered feature learning, are built exactly for this.

Use classical ML for credit scoring, demand forecasting, fraud detection, churn prediction: structured tables, interpretability required
Use deep learning for image recognition, speech, language understanding, video: unstructured data, pattern complexity at scale

The honest take: deep learning is not always better. It requires significantly more data and compute. When you have a clean tabular dataset and a clear prediction task, XGBoost will often beat a neural network and train in seconds rather than hours.

Neural Network Basics

Before we get into specific architectures, you need to understand the building blocks. Every deep learning model, regardless of how complex, is built from the same core components.

Neurons, Layers, and Activation Functions

A neuron is the basic unit. It takes a set of inputs, multiplies each by a learned weight, sums everything up, and passes the result through an activation function. That output feeds into the next layer.

Here I attach an image of the comparison of a real neural network and an artificial one (on the right). You do not need to understand the formula, but just understand the following: 3 weights are coming to the neuron (these are the values of the previous neurons of a neural network). Inside there is a scary formula. However, trust me, is not that scary, it is basically is a fancy way of saying... Let's sum them all up! and just a little thing more I will describe soon, which is, we add an activation function to it.

That's all!

So... what is an activation function? the activation function is what gives neural networks their power. Without it, stacking layers would be mathematically equivalent to having just one layer: you'd just be doing linear transformations. Activation functions introduce non-linearity, which is what allows the network to learn complex patterns. If you did not completely get it, it is okay, this is the biggest mathematical part of the explanation!

The three you'll see everywhere:

ReLU (Rectified Linear Unit): outputs zero for negative inputs, passes positive inputs through unchanged. Simple, fast, and the default choice for hidden layers in most networks. The fact that something this simple works so well is one of the quiet surprises of deep learning.
Sigmoid: squashes output to a value between 0 and 1. Used in binary classification output layers where you want a probability.
Softmax: extends sigmoid to multiple classes. Takes a vector of raw scores and converts them into probabilities that sum to 1. Used in the final layer of any multi-class classifier.

So basically, you can see the activation functions as a filter to decide what value will the neuron have after we summed up previous neurons values. And we can do it in different ways. If you want to understand it a bit deeper here is a deconstruction on how it exactly works:

Feedforward Networks: How Information Flows

Now you know how an artificial neural netwok works, great! The next step is stacking them up and we would get a neural network. Feedforward neural networks are the simplest neural network, information travels in one direction only: input → hidden layers → output. No loops, no memory, no feedback. Each layer is fully connected to the next, every neuron in layer N connects to every neuron in layer N+1.

This is called a feedforward network, and it's the foundation that every other architecture builds on top of or departs from. (yes, including chatGPT, Claude and other transformer based models, here is where it all starts.)

Backpropagation: How the Network Actually Learns

Training a neural network means finding the right weights. You do this by:

Making a prediction with the current weights
Measuring how wrong it was (the loss, remember this term)
Computing how much each weight contributed to that error
Nudging every weight slightly in the direction that reduces the loss

Step 3 is backpropagation, the algorithm that efficiently computes the gradient of the loss with respect to every weight in the network, propagating the error signal backward from the output layer to the input. Step 4 is gradient descent, the optimiser that uses those gradients to update the weights.

This loop (forward pass, compute loss, backward pass, update weights) repeats for millions or billions of iterations during training. That's how a network goes from random noise to something that can recognise faces, translate languages, or generate code.

Important note: How the neuron, backpropagation, gradient descent, losses work is important to understand. You will not need to perform any math in practice but this theory can help you grasp what you are doing better. Do not get stuck on it, but if you can learn it, I highly recommend it.

I decided to write it in this blog so you know it exists but this part and the next two (weight initialisation and batch normalization) could be skipped since they are not that much oriented towards the practice but more towards foundational knowledge.

Give it a quick read, if you do not fully get it, keep on going and dont worry!

Weight Initialisation

How you set the initial weights before training matters more than most tutorials admit. Start them all at zero and the network won't learn: every neuron computes the same thing and the gradients are identical. Start them too large and training becomes unstable. Smart initialisation schemes (Xavier, He initialisation) are designed to keep signal flowing cleanly through the network from the start.

Batch Normalisation

As networks get deeper, a problem emerges: the distribution of activations shifts during training, making learning unstable and slow. Batch normalisation addresses this by normalising the inputs to each layer across a mini-batch, keeping activations in a stable range. It's one of those techniques that felt like a trick when it was introduced and turned out to be foundational, it's now standard in almost every deep architecture.

Key Architectures

Now for the interesting part. If you made it till here, congratulations! now its the fun part, so, keep on reading!

Deep learning is not one thing, it's a family of architectures, each designed for a different kind of data and a different kind of problem. Here are the ones worth knowing.

Feedforward Neural Networks (FFNN)

The simplest deep network. Fully connected layers, information flows in one direction, no special structure. This is the architecture that introduces every concept (neurons, activations, backpropagation) in its clearest form.

In practice, pure FFNNs are rarely used for complex tasks. Images have spatial structure that FFNNs ignore. Sequences have temporal dependencies that FFNNs can't capture. But understanding the FFNN deeply is non-negotiable before moving to anything else.

When you'd use it: tabular data, simple classification and regression tasks, as a component inside larger architectures.

Convolutional Neural Networks (CNNs)

CNNs are the architecture that put deep learning on the map. In 2012, a CNN called AlexNet won the ImageNet competition by a margin so large it ended the debate about whether deep learning worked. It did.

The key idea: instead of connecting every neuron to every pixel (computationally insane for large images), CNNs apply small filters that slide across the input, detecting local patterns. Early layers learn to detect edges and textures. Later layers combine those into shapes, objects, faces.

This design is efficient, spatially aware, and extraordinarily effective on anything that has grid-like structure: images, video frames, certain kinds of audio.

When you'd use it: image classification, object detection, medical imaging, video analysis, any problem where spatial patterns matter.

Recurrent Neural Networks (RNNs) and LSTMs

What if your data is a sequence (a sentence, a time series, an audio clip) where the order of elements matters?

FFNNs and CNNs don't have memory. They process each input independently. RNNs fix this by feeding the hidden state from the previous step into the current step, giving the network a form of short-term memory.

In theory, this lets RNNs capture long-range dependencies. In practice, they struggle: gradients either explode or vanish as they travel through many time steps, making it hard to learn patterns that span long sequences.

LSTMs (Long Short-Term Memory networks) solve this with a more sophisticated memory mechanism, gates that control what information to keep, what to forget, and what to output. LSTMs were the state of the art for language and sequence tasks for years before Transformers arrived.

The following is a illustration of its internal construction, this is how one LSTM cell looks like:

Do not worry as you will never have implement this from scratch :)

Again, is just good to know this exists for the future usage.

They're not obsolete, they're still used in production systems where efficiency matters and sequences are modest in length. But for most language tasks, Transformers have superseded them.

The following is an illustration of how an entire LSTM looks like:

When you'd use them: time series forecasting, speech recognition (in resource-constrained settings), sensor data, any ordered sequence where Transformers would be overkill.

Autoencoders and VAEs

An autoencoder is trained to compress an input into a smaller representation (the latent space) and then reconstruct it back to the original. The bottleneck forces the network to learn the most essential features of the data.

Variational Autoencoders (VAEs) extend this by learning a probability distribution over the latent space rather than a fixed point. This makes the latent space continuous and structured, which means you can sample from it to generate new data, not just reconstruct existing inputs.

VAEs were an early serious approach to generative modelling and introduced ideas (latent space, encoder-decoder structure) that appear throughout modern AI.

When you'd use them: anomaly detection, data compression, generative modelling, representation learning, synthetic data generation.

GANs (Generative Adversarial Networks)

GANs are one of the most creative ideas in all of deep learning. The setup: two networks trained in opposition.

The generator produces fake data (images, audio, whatever the domain). The discriminator tries to tell real from fake. As training progresses, the generator gets better at fooling the discriminator, and the discriminator gets better at detecting fakes. Each improves in response to the other.

When it works, GANs produce extraordinarily realistic outputs. They dominated image synthesis for years and the photorealistic faces you may have seen on sites like "This Person Does Not Exist" are GAN-generated.

They're notoriously difficult to train, mode collapse, training instability, and sensitivity to hyperparameters make them frustrating in practice. Diffusion models have largely superseded them for image generation, but the adversarial training concept remains influential.

When you'd use them: image synthesis, data augmentation, style transfer, domain adaptation.

Diffusion Models

Diffusion models are the architecture behind Stable Diffusion, DALL-E, and most of the state-of-the-art image generators you've seen. The idea is elegant and counterintuitive.

Training: take real images and gradually add Gaussian noise until they're pure static. Teach the network to reverse this process, to predict and remove the noise at each step.

Generation: start with pure random noise and run the learned denoising process repeatedly until a coherent image emerges.

Diffusion models produce higher quality, more diverse outputs than GANs and train more stably. They're computationally heavier at inference time (many denoising steps required), but the quality improvement has made the trade-off worth it for most applications.

When you'd use them: image generation, video generation, audio synthesis, any high-quality generative task.

Personal Note

The latest three (VAEs, GANs and Difussion Models) are great for generative related tasks including synthetic data generation. Currently diffusion models have taken the space due to their high accuracy, efficiency and deployability.

We published in 2023 a research between Samsung Advanced Institute of Health Science and Technology (SAIHST), Samsung Medical Center (SMC), Yonsei Severance Hospital and Google Cloud (USA) comparing the use of the three of them for synthetic data generation on healthcare settings. If you are interested in the topic click here.

The Transformer

Everything changed in 2017 when a Google paper titled "Attention Is All You Need" introduced the Transformer architecture. GPT, BERT, DALL-E, Whisper, Stable Diffusion, every major model of the last several years is built on top of it or derives from it directly.

The core innovation: self-attention. Instead of processing a sequence step by step (like an RNN), the Transformer processes all positions simultaneously and lets each position directly attend to every other position. This solves the long-range dependency problem completely, and critically, allows full parallelisation during training.

Self-Attention and Multi-Head Attention

Self-attention allows the model to weigh how relevant each word (or token) is to every other word when building a representation. In the sentence "The bank by the river was steep," the word "bank" needs to attend strongly to "river" to resolve its meaning correctly. Self-attention learns to do this.

Multi-head attention runs several self-attention operations in parallel, each learning to attend to different kinds of relationships simultaneously. One head might track syntactic structure; another might track semantic similarity. The outputs are combined and projected forward.

Positional Encoding

Transformers have no built-in sense of order, self-attention is permutation-invariant. Positional encoding fixes this by adding information about each token's position in the sequence before it enters the network. The model learns to use this position signal to understand order, proximity, and structure.

Encoder vs. Decoder vs. Encoder-Decoder

Not all Transformers are the same. There are three architectural variants:

Encoder-only (e.g. BERT): reads the full sequence bidirectionally, building rich contextual representations. Best for tasks that require understanding: classification, named entity recognition, semantic search.
Decoder-only (e.g. GPT): generates tokens one at a time, each attending only to previous tokens. Best for generation: writing, code, conversation.
Encoder-decoder (e.g. T5, original Transformer): encodes an input sequence, then decodes an output sequence. Best for transformation tasks: translation, summarisation, question answering.

Understanding which variant you're working with (and why it was chosen) is one of the most practically useful things you can know when working with modern AI.

What Comes Next

You now have a map of the deep learning landscape: the building blocks, the key architectures, when to use each, and why they exist. That's the conceptual foundation.

The practical path from here:

Get hands-on with PyTorch or TensorFlow: implement a simple FFNN, then a CNN on image data. Seeing the training loop in code cements everything.
Work through a sequence task: build or use an LSTM on a real time series dataset.
Study the Transformer in depth: read "Attention Is All You Need" after you've built intuition. It will make sense now in a way it wouldn't have before.
Explore modern applications: fine-tune a pretrained model, experiment with diffusion pipelines, build something that uses what you've learned.

If you're wondering where deep learning fits in the bigger picture (how it relates to machine learning and where Generative AI comes in) check out our AI learning roadmap for the full view.

The architecture names will start feeling familiar quickly. Build things. Break them. Figure out why. That's the actual learning.

If you want to learn more, we have more content in our blog here!

Machine Learning Basics: Core Concepts Explained Simply

Javier Aguirre — Fri, 29 May 2026 03:22:57 +0000

If you've heard the term "machine learning" thrown around but still aren't sure what it actually means, you're in the right place. This isn't a roadmap for learning machine learning (we covered that here). This is the conceptual foundation: the ideas, the vocabulary, and the mental models you need so that everything else clicks.

Think of it as the "what" before the "how."

Let's get into it!

What Is Machine Learning?

Machine learning is a way of building systems that learns from data rather than following hand-written rules.

In traditional programming, a developer writes explicit instructions: if X, do Y. Every scenario must be anticipated and coded manually. Machine learning flips this model entirely. Instead of writing the rules yourself, you feed the system a large collection of examples (data where you already know the outcome) and the algorithm figures out the rules on its own.

A concrete way to think about it: imagine you want a computer to recognise photos of cats. You could try to write rules: "look for pointy ears, whiskers, fur." But edge cases multiply fast. What about a cartoon cat? A sleeping cat? A hairless breed?

Machine learning sidesteps the rule-writing problem entirely. You show the model thousands of labelled photos ("cat" / "not a cat"), it extracts the underlying patterns, and it generalises that knowledge to photos it's never seen before.

That pattern-recognition loop (examples in, predictions out) is what machine learning is, at its core.

Machine learning, in simple words:
"a system that gets smarter the more data it sees, rather than following a fixed set of rules."

What Is Machine Learning Used For?

Machine learning is already embedded in most of the software you use daily, though the reality is more layered than the usual list of examples suggests.

Credit risk and underwriting: banks use gradient boosted trees and logistic regression to assess lending decisions, because income, debt, history, and geography interact in ways too complex for manual rules.

Fraud detection: modern fraud systems combine anomaly detection, graph ML (to surface fraud rings across networks), and rule-based filters working in tandem. Patterns are adversarial and constantly shifting, which is exactly where ML earns its place.

Search ranking: retrieval uses indexing and heuristics, but ranking is heavily learned. Models predict which result a specific user is most likely to find useful based on signals from billions of past interactions.

Advertising and recommendations: arguably the largest economic application of ML on earth. Predicting click-through rate, conversion probability, and long-term user value drives enormous commercial value across every major platform.

Demand forecasting: retailers, energy grids, and supply chains use ML to predict inventory needs, consumption patterns, and logistics requirements. Gradient boosted trees and hybrid statistical models dominate here.

Anomaly detection: server monitoring, cybersecurity logs, and industrial sensors all use ML to flag behaviour that deviates from learned baselines. Isolation forests and autoencoders are workhorses in this space.

Marketplace matching: job platforms, dating platforms, ride-sharing, and marketplaces use ML to predict compatibility between two entities: driver and rider, candidate and role, buyer and listing.

The common thread: ML works best when the rules are too complex to write by hand, the environment shifts over time, and there's feedback data at scale to learn from.

The Three Types of Machine Learning

Not all machine learning works the same way. Understanding the three core categories is fundamental to understanding machine learning properly.

Supervised Learning

Supervised learning is the most common type and the best starting point for beginners.

The model trains on labelled data, every example in the training set comes with the correct answer attached. A spam filter trains on emails labelled "spam" or "not spam." A house price model trains on historical sales records where the price is already known.

What is supervised learning in practice?

The model makes predictions, compares them to the correct labels, measures the error, and adjusts. Repeat this millions of times across thousands of examples and the model gradually gets accurate. At inference time (when it sees new, unlabelled data) it applies everything it learned.

Supervised learning covers two main tasks:

Classification: predicting a category (spam/not spam, disease/no disease, churn/retain)
Regression: predicting a number (house price, temperature, revenue)

Unsupervised Learning

With unsupervised learning, there are no labels. The model receives raw data and must discover structure on its own.

The most common application is clustering. Grouping data points that are similar to each other. Customer segmentation works this way: you feed the model purchase history, browsing behaviour, and demographics, and it discovers natural groupings without anyone telling it what those groups should be.

Other unsupervised applications include anomaly detection (spotting data points that don't fit the pattern) and dimensionality reduction (compressing complex data into simpler representations without losing key information).

Reinforcement Learning

Reinforcement learning is the odd one out. It doesn't learn from a fixed dataset at all.

Instead, an agent takes actions in an environment and receives feedback: rewards for good outcomes, penalties for bad ones. Over time, through trial and error, it learns the strategy that maximises reward.

This is how DeepMind's AlphaGo mastered the game of Go, how robotics systems learn to walk, and how recommendation engines learn to keep users engaged. It's one of the most exciting areas in machine learning today and also the one of the most complex.

What Is the Difference Between Machine Learning and Deep Learning?

This trips up a lot of people new to the field. The short answer: deep learning is a subset of machine learning. Some of my AI engineering collegues would disagree, saying it is another field on its own that comes after machine learning. However, the exact relationship doesnt matter, either is the parent or sibling of machine learning it definitely is a close relative.

Then... what is that relationship? Well, classic machine learning algorithms (linear regression, decision trees, random forests...) work by finding mathematical relationships in structured data. They're transparent, fast, and still dominant in most real-world business applications.

Deep learning uses artificial neural networks with many layers (hence "deep"). Each layer learns increasingly abstract representations: an early layer of an image model might learn to detect edges; a later layer might learn to detect faces. This layered abstraction is what gives deep learning its power on complex, unstructured data.

The bottom line: when someone says "we use machine learning," they may or may not mean deep learning. When someone says "we use deep learning," that's always a subset of machine learning.

So, what you should remember is that both have the same purpose: a system that gets smarter the more data it sees rather than following a fixed set of rules. However, in the case of deep learning, it tends to require more compute, less efficient but can be smarter for harder cases.

Key Terms You'll Keep Seeing

Understanding machine learning means getting comfortable with a core vocabulary. Here are the terms that come up constantly:

Training data: the dataset the model learns from. The quality and size of this data is the single biggest factor in model performance.

Model: the mathematical function that maps inputs to outputs after training. When people say "we trained a model," this is what they mean.

Features: the input variables the model uses to make predictions. In a house price model, features might include square footage, number of bedrooms, and postcode.

Labels: the correct output values in supervised learning. The "answers" in the training data.

Training: the process of exposing a model to data and letting it adjust its internal parameters to minimise error.

Overfitting: when a model learns the training data too well, including its noise and quirks, and fails to generalise to new data. A model that scores 99% on training data and 60% on real data has overfit.

Underfitting: the opposite problem. The model is too simple to capture the real patterns in the data.

Hyperparameters: settings you choose before training begins (number of trees in a forest, learning rate, number of layers). Distinct from parameters, which the model learns during training.

Why Machine Learning Works (The Intuition)

At the heart of basic machine learning is a deceptively simple idea: generalisation.

A model that just memorised its training data would be useless, you'd already have that data. What you want is a model that has learned something general enough to make accurate predictions on data it has never encountered.

The way models achieve this is by minimising a loss function, a mathematical measure of how wrong their predictions are. During training, the algorithm repeatedly adjusts the model's internal parameters in the direction that reduces loss. After enough iterations across enough data, the model has found a set of parameters that capture the underlying structure of the problem.

This is why data quality matters so much. Garbage in, garbage out. if the training data is biased, incomplete, or mislabelled, the patterns the model learns will reflect those flaws, no matter how sophisticated the algorithm.

A Note on Maths

I personally like to bring this one frequently. As we covered in the previous blog, one of the most common questions when people start understanding machine learning: do I need to be good at maths?

The honest answer is: not to start, and not as much as you'd think to go deep.

The mathematical foundations are there (linear algebra, probability, calculus) but they describe what's happening inside the algorithms, not how to use them. Most engineers use libraries like Scikit-learn that handle the implementation entirely. The maths becomes valuable when you want to understand why a model behaves a certain way, not to run it.

Start with intuition. Pick up the maths when a specific question pulls you toward it. That order works far better than studying maths in a vacuum before you've built anything.

What Comes Next

Now that you have the conceptual foundation, the natural next step is getting hands-on. The core skills to tackle in order:

Exploratory Data Analysis (EDA): understand your data before you model it
Data preparation: clean, transform, and structure data for training
Model training: apply the right algorithm for the task
Model evaluation: measure performance properly (accuracy alone isn't enough)
Iteration: improve, tune, and deploy

If you want the full practical path (tools, libraries, timeline, and projects) check out our guide on how to learn machine learning from scratch.

If you want to read our full blog, visit our blog here!

How to Learn Machine Learning from Scratch

Javier Aguirre — Fri, 29 May 2026 03:18:26 +0000

You want to learn machine learning. Great! Now you are staring at a screen full of courses, YouTube videos, Reddit threads, and bootcamp ads, and you have absolutely no idea where to begin.

After being over 10 years in the AI field, I have seen brilliant people give up on machine learning not because it was too hard, but because they started in the wrong place, hit a wall they did not expect, and concluded the whole thing was not for them.

This post is the guide I wish someone had handed me at the beginning. The honest, practical path to go from complete beginner to someone who can actually build and deploy machine learning models. And let me tell you one last thing, it is definitely not as hard as it seems. Trust me :)

Let's get into it.

First, Let's Clear Something Up

When people ask "how do I start machine learning?", they usually follow it up with something like: "do I need a PhD? Do I need to be amazing at math? Do I need to already know how to code?"

The answer to all three is clearly no.

Machine learning for beginners has never been more accessible. The tools are better, the libraries do most of the heavy lifting, and the community has produced genuinely good learning resources. What you need is not genius but a clear path and the willingness to follow it.

Here is that path.

Step 1: Python First, But Not Too Much Python!

Before you touch a single machine learning concept, you need to be able to write basic Python. Not software-engineer-level Python. Just enough to load data, write a function, and run a script.

What you actually need:

Variables and data types
Loops and conditionals
Functions
Lists and dictionaries
Importing libraries

That is it. A few weeks of consistent practice will get you there. Do not disappear into a six-month Python deep dive, that is procrastination dressed up as preparation. Ask your best friends (I mean Claude, Gemini, ChatGPT... They are amazing) for help, they really know how to code and teach coding.

Once you can write simple scripts without completely panicking, you are ready.

Step 2: What Is Machine Learning, Actually?

Here is the simplest way to think about it: normally, when you write a code, you write the rules. You tell the computer exactly what to do in every situation. Machine learning flips that around. Instead of writing the rules, you give the algorithm a pile of examples (historical data where you already know the outcome) and it figures out the rules itself. Sounds pretty cool, doesn't it?

That is genuinely it. A machine learning model is just a pattern-finding machine. You feed it enough examples, it learns what those examples have in common, and then it uses that knowledge to make predictions on data it has never seen before. The more examples, the better the patterns. The better the patterns, the more accurate the predictions.

Step 3: The Core Skills of Machine Learning

Pay attention!! This is where most people get it wrong and go lost. However, it is not hard if you know the path. People think machine learning is about knowing a long list of algorithms. It is not. Others, that they need a math background for it. That is simply not true. It is about mastering a set of core skills that every project requires, in roughly the same order, every single time.

Here is what that actually looks like.

Exploratory Data Analysis (EDA)

Before you build anything you need to understand what you are working with. As an example, you may want to predict who will pay future mortgages based on past data. At that point you should ask yourself: Where does the data come from? What do the columns actually mean? What is missing? What looks suspicious? EDA is the skill that separates people who build models that work from people who build models that silently fail. It is also the step that is least taught and most skipped. Same as with python, do not spend 6 months on learning EDA, spend a few weeks and move next.

Data Preparation

Real-world data is a mess. Missing values, inconsistent formats, outliers that make no sense, categorical variables that need to be converted into numbers. The prior step prepares you to understand all of it, on this step you will focus on how to prepare it for training. Data preparation is where you spend most of your time on any real project. Learn to clean data well, and everything downstream becomes easier.

Model Training

This is the step everyone takes it wrong. People think is the big one. It is not. In practice, once your data is clean, training often takes a few lines of code and you are ready to go. The two previous steps are where the magic of machine learning occurs, on having and preparing good data. However, I have to make a disclaimer, I recommend understanding about the different models so that you know when to use one or another. Hearing a short class on how they work will highly benefit you.

Model Evaluation

This is an important one, it is not a hard one, but can be slightly confusing. Many beginners make mistakes without realizing. Accuracy alone is a terrible metric for most projects. Learn precision, recall, F1-score, ROC-AUC. Understand the difference between overfitting (your model memorised the training data and fails on anything new) and underfitting (your model is too simple to capture the real patterns). Know the difference between your training set, validation set, and test set, and never, ever mix them up.

Model Improvement (optional)

Once you have a baseline model, you can make it better. This means tuning hyperparameters, trying different algorithms, engineering better features from your raw data. This is where craft comes in and where the interesting problem-solving happens.

Deployment (optional)

A model sitting on your laptop is not useful. Learn to put it somewhere that actually does something, an API, a simple web app, a scheduled job. You do not need to become a software engineer to do this, but you need to know the basics.

Step 4: The Algorithms Worth Knowing

Once you understand the core skills, algorithms start to make sense. Now you know what problem they are solving. You do not need to memorise fifty of them. You need to understand a core set deeply.

Linear Regression

Predicting a continuous number (price, temperature, revenue). The simplest model you will build and, by far, the most educational. Understand this one fully before you move on.

Logistic Regression

Despite the name, this is a classification algorithm. Will this customer churn? Is this email spam? Binary decisions. This is your first taste of classification.

K-Nearest Neighbours (KNN)

The most intuitive classifier in ML. To predict something, it looks at the K closest examples in your training data and goes with the majority. No real "training" happens. It just memorises the data and reasons from it at prediction time. A great first algorithm to understand because the logic is completely transparent.

K-Means Clustering

Your entry point into a different kind of ML: unsupervised learning, where you have no labels and let the algorithm find structure on its own. K-Means groups your data points into K clusters based on similarity. Used everywhere from customer segmentation to anomaly detection.

Decision Trees

One of the most intuitive models in all of ML. You can actually visualise how a tree makes decisions, which is invaluable for building intuition.

Random Forests

A collection of decision trees working together. One of the most reliable, robust algorithms in practice. If you are ever in doubt about which model to try first, try Random Forest.

Support Vector Machines (SVM)

Finds the boundary that best separates two classes, with as much margin between them as possible. Works particularly well with smaller datasets and high-dimensional data like text. The intuition behind "maximum margin separation" is one of the most elegant ideas in all of ML.

Gradient Boosting (XGBoost, LightGBM)

The go-to for structured/tabular data in production. These models win Kaggle competitions constantly. Learn them and you will be dangerous on real business problems.

You do not need to implement these from scratch. What you need is to understand: why does this algorithm work? When should I use it? What does it assume about the data?

Step 5: The Libraries You Will Actually Use

Python for machine learning means a short list of libraries that you will use over and over again:

NumPy

Numerical computing. Arrays, matrix operations. Under the hood of almost everything.

Pandas

Your data manipulation workhorse. Load CSVs, clean data, merge tables, explore distributions. You will use this constantly.

Matplotlib / Seaborn

Visualise your data. Plot distributions, spot outliers, understand what you are working with.

Scikit-learn

The gold standard ML library. Every classical algorithm you need, with a consistent API that is genuinely well-designed. This is where you will spend most of your time as a beginner.

XGBoost / LightGBM

Once you are comfortable with Scikit-learn, add these to your toolkit. They are industry workhorses.

Do not chase every new library. Master these first.

Step 6: Build Things That Are Slightly Uncomfortable

Reading is not learning machine learning. Building is learning machine learning.

After each concept, build something. It does not need to be impressive — it needs to be real. Some ideas:

Predict housing prices with linear regression on a public dataset
Build a spam classifier with logistic regression on email data
Predict customer churn with a Random Forest

Go to Kaggle. Find a beginner competition. Download the data. Make a terrible first submission. Then make a slightly less terrible second one. That process teaches you more than any course.

The discomfort of working with messy, real data and not knowing exactly what to do is not a sign that you are doing something wrong. It is the actual learning.

On Math: Stop Worrying About It

Yes, machine learning has mathematical foundations. Linear algebra, probability, calculus, statistics. They are all in there.

Here is the truth: you do not need to master any of that. Most of that is for people who will create the algorithms of the future, but not for building AI. You need enough statistics to understand what a mean and variance are. That is genuinely it.

Now, As you go deeper and start wondering why certain algorithms behave the way they do, you will naturally find yourself reading about the math behind them, but no need to master it. That is when it clicks, because you have context. Learning math in isolation, before you have built anything, is like studying the grammar of a language you have never spoken.

Pick up the math as you need it. Not before.

A Realistic Timeline

For someone starting from scratch and putting in consistent time:

Weeks 1–4

Python basics. Get comfortable with the language.

Months 2–3

Core ML skills and concepts: EDA, data prep, training, evaluation, the main algorithms, Scikit-learn. Build small projects.

Months 4–5

Go deeper. Tackle a real Kaggle dataset. Handle genuinely messy data. Deploy something small.

Months 6+

Expand: gradient boosting, feature engineering, model evaluation at depth.

This assumes a few focused hours per week, not full-time immersion. If you go full-time, compress everything. Consistency matters far more than intensity.

The Summary

Start with Python basics. Learn what machine learning actually is: pattern recognition from data, not magic. Then master the core skills in order: understand your data, prepare it, train a model, evaluate it properly, improve it, and deploy it. Learn the key algorithms well rather than every algorithm superficially. Build real things with real data. Pick up math as you need it, not before. That is how you learn machine learning from scratch.

At Fondra Labs, we are building the step-by-step resources to walk you through exactly this journey. Stay tuned.

Frequently Asked Questions

How to learn machine learning?

Start with Python basics, then work through the core ML skills in order: exploratory data analysis, data preparation, model training, evaluation, and deployment. Use Scikit-learn. Build real projects with real data.

How to get into machine learning?

You do not need a degree or math expertise to start. Pick up Python, learn the fundamentals, and build a portfolio of projects using public datasets. Kaggle is a great place to start.

How to start machine learning?

Write Python first — just the basics. Then pick one beginner dataset and try to build a prediction model with Scikit-learn. That first project, however messy, teaches you more than any course.

How do I start learning machine learning?

Pick one clear resource, follow it from start to finish, and build something at every stage. The most common mistake is jumping between resources constantly instead of going deep on one path.

How to become a machine learning engineer?

Learn the fundamentals, build a public portfolio of projects on GitHub, document what you built and why, and start applying. You do not need a perfect CV, you need demonstrated ability to work with real data and ship real models.

For more content you can visit our blog here!

AI Learning Roadmap: Where to Start if You're a Complete Beginner

Javier Aguirre — Wed, 27 May 2026 04:41:24 +0000

Nowadays, AI is everywhere. More than ever, people want to learn but being the internet flooded with resources makes it incredibly hard to know where to start. It feels like there is too much information, pointing in too many directions. I have been over 10 years in the AI field, and this blog is what you will actually need to understand the dos and don'ts of an effective AI learning roadmap. Keep on reading :)

The problem

Everyone starts in the wrong place

Everyone has heard of ChatGPT. Everyone has heard of LLMs, image generators, voice assistants. And so, naturally, everyone starts there because that's what's visible, exciting and all over the news.

Here's the thing: that is the biggest mistake you can make. What you see in ChatGPT is the latest and most complex technology in the entire AI field. Starting there is like deciding you want to become a chef and showing up to a three-Michelin-star kitchen on day one. It looks great from the outside. Inside, you will be completely lost.

But, don't panic! There is a right way to learn AI. It is not as hard as you think. But it requires starting at the foundation and I am going to show you exactly what that looks like for anyone wondering how to start learning AI.

The golden rule:
"Don't chase the latest. Master the foundations first, and the latest will start making sense on its own."
Javier Aguirre

Before diving in: learn basic Python

Before we talk about AI concepts at all, there is one practical thing to do first: learn the basics of Python. It is the language of AI, and you do not need to become a software engineer — you just need enough to write simple scripts, load data, and run models.

A few weeks of basics is more than enough to get started. Variables, loops, functions, lists. That's it for now. This is why so many people begin with Python for AI beginners courses before moving deeper into machine learning. And here's good news that will surprise most beginners:

On math:
"You do not need math to get into AI. Full stop. You may eventually bump into a concept here and there (a few statistics ideas, basic linear algebra) but those are easy to pick up when the moment comes. Don't let math be the reason you don't start."

The AI Learning Roadmap

How the AI world is actually structured

Before going straight in, you need to understand the landscape. That thing you've heard of — Generative AI — is not the beginning of the story. It is the current top of a much bigger structure. Think of it like a house: Generative AI is the roof. And nobody builds a house starting from the roof.

Let's look at how the AI house actually looks.

This diagram tells you everything you need to know about why most beginners struggle. Generative AI (ChatGPT, Midjourney, and the tools making headlines) sits at the centre of these nested layers. Every concept that powers it comes from the layers around it. Skip those layers and you are building your understanding on nothing.

My personal recommendations

Start with Machine Learning

Machine Learning is the oldest and most foundational part of modern AI. It is also, in many ways, the most powerful. It is driving enormous amounts of revenue across industries right now, from fraud detection to demand forecasting to personalisation engines. Companies are not running it because it's trendy. They're running it because it works.

If you are following a machine learning engineer roadmap, this is where your real understanding begins.

Will it generate text like ChatGPT? No. But here is what it will let you do:

*Predict outcomes: * Will this customer churn? What will sales be next quarter? Which loan applicant is high risk? ML answers these questions with high accuracy, using nothing more than historical data and a well-chosen model.
Classify anything: Is this email spam or not? Is this transaction fraudulent? Does this medical scan show an anomaly? Classification is one of the most commercially valuable things in AI, and ML is the gold standard for it.
Deploy cheap and fast: ML models are lightweight. They run on a basic server, cost little to host, and can be put into production in days. This is the opposite of the expensive GPU-hungry infrastructure that Generative AI requires.
Build real AI intuition: Understanding how a Random Forest learns, why a model overfits, what a training set versus a test set means — these concepts transfer directly to every other area of AI. This is where you grow actual understanding, not just surface-level familiarity.

The core ideas in Machine Learning are: supervised learning (teaching a model with labelled examples), unsupervised learning (finding patterns without labels), and the full process of training, evaluating, and deploying a model. Get comfortable with these, and the rest of AI opens up. If you have ever asked yourself what is machine learning?, this is the practical answer.

Then move to Deep Learning

Have you heard the words neural networks? Backpropagation? Deep learning? If so, this is what those words refer to. Deep learning is the natural evolution of classical machine learning and understanding ML first means you will actually grasp why deep learning exists, not just how to use it.

Instead of traditional algorithms, deep learning uses networks of artificial neurons (layers upon layers of them) that learn extremely complex patterns from data. The results are more powerful for many tasks, and more flexible, but they require significantly more data and computing resources to train.

If you are building your own deep learning roadmap, this is the stage where AI starts becoming truly powerful.

Here is where deep learning truly shines:

Images and computer vision

Deep learning powers every modern image recognition system — from the Face ID on your phone to the quality control cameras in a factory to self-driving car perception. Classical ML simply cannot match its accuracy on visual tasks.

Audio and speech

Voice assistants, real-time transcription, music generation, sound classification — all deep learning. The architecture that understands spoken language is built entirely on neural network layers.

Complex pattern recognition

Anything where the relationship between input and output is extremely non-linear and hard to express as rules, deep learning tends to be the right tool. Drug discovery, genomics, anomaly detection at scale.

Deep learning is also where you start encountering architectures with names such as : CNNs for images, RNNs for sequences, and (most importantly) the Transformer. Remember that name. It is what everything else is built on. This is also the point where people finally understand what is deep learning? in a meaningful way.

Finally, Generative AI (only after ML and DL!!!)

Now (and only now) does Generative AI make sense. Because once you understand ML and deep learning, you understand where GenAI comes from. The Transformer architecture at the heart of every modern large language model is a deep learning architecture. The training techniques are derived from everything you have already learned. The intuition transfers.

Generative AI is extraordinary. It can write, code, reason, create images, generate music, and hold conversations. The commercial excitement around it is real and justified. But here is something most people entering the field do not know:

Important Reality Check:
Most problems companies actually have can be solved with ML or DL — not GenAI. GenAI is incredibly expensive to run, hard to scale reliably, and often complete overkill for the task at hand. Jumping to GenAI head-on, without foundations, is a mistake that costs time, money, and understanding. Do not do it.
Javier Aguirre

GenAI is the mix of everything learned before. Understanding it properly — knowing when to use it, when not to, how to build on top of it rather than just prompting it — requires the foundations you built in steps 1 and 2. That is what separates someone who truly works in AI from someone who just uses it.

If you are wondering how does an AI learn?, the answer starts with these foundations in machine learning and deep learning long before Generative AI enters the picture.

More times than I remember projects fail and get stuck because managers and higher up people ask for genAI when a decision tree would have solved the problem. I am not saying generative AI is not marvelous, but, learning when to use is one of the best favours you can do yourself as a developer.

Summary

The right order, at a glance

1st: Python basics: Just enough to write scripts and work with data. No advanced engineering needed.
2nd Machine Learning: The oldest, most practical, most deployable, and most foundational layer of modern AI.
3rd Deep Learning: Neural networks, images, audio, the Transformer. The powerful evolution that enables everything modern.
4th Generative AI: The exciting frontier, but only makes sense once the foundations are solid.

One last thing

You do not need math. You do not need to be a genius. You do not need expensive bootcamps. You need consistency, the right order, and a willingness to build things even when they break.

At Fondra Labs, we are building the resources to walk you through every step of this journey in depth. Stay tuned, the real learning starts now.