DEV Community

Cover image for Introduction to Machine Learning
Ngigi nyawira
Ngigi nyawira

Posted on

Introduction to Machine Learning

What is Machine Learning?

One of the many subsets of AI that focus on learning patterns from datasets to make predictions or decisions without explicit programming. By training algorithms on large datasets, these models generalize to new data, powering applications like image recognition, recommendation engines, and language models.

Types of Machine Learning.

1.supervised learning

Models are trained on labeled data. This means every piece of training data comes with a 'correct answer'.

  • Classification. - predicts discrete categories or labels.
    Examples: Identifying spam emails (Spam Filtering Guide), recognizing objects in photos, or diagnosing diseases

  • Regression.- Predicts continuous numerical values.(output is a continuous number)
    Examples: Predicting house prices based on location and size (Real Estate Price Prediction), forecasting weather temperatures, or estimating stock prices.

2.Unsupervised learning

Models finds patterns on unlabeled data, the goal here is to discover hidden structure

  • Clustering:grouping similar data points together.
    Examples: segmenting customers by purchasing habits for targeted marketing or grouping news articles by topic.

  • Dimensionality Reduction: Simplifying complex datasets by reducing the number of features while keeping the most important information

  • Association Rule Learning: Finding rules that describe large portions of your data.
    Example: Identifying that people who buy beer also tend to buy diapers (market basket analysis).

3.Reinforcement learning

Learning through trial and error with rewards
Examples: Training robots to walk, teaching AI to play games like Chess or Go, and developing autonomous vehicles (Self-Driving Car Tech).

Features vs Labels.

These are fundamental building blocks of a dataset in supervised learning. Machine learning systems uses Relationships between Inputs to produce Predictions.

  • In algebra, a relationship is often written as y = ax + b:
    y is the label we want to predict
    a is the slope of the line
    x are the input values
    b is the intercept

  • With ML, a relationship is written as y = b + wx:

y is the label we want to predict
w is the weight (the slope)
x are the features (input values)
b is the intercept

  • In Machine Learning terminology, the label is the thing we want to predict.
    It is like the y in a linear graph.
    The correct answer we want the model to predict or classify. Labels represent what we want to know they exist only in supervised learning.

  • In Machine Learning terminology, the features are the input.
    They are like the x values in a linear graph
    The raw measurements or attributes fed into the model. Features describe what we know about each example the signal from which the model extracts patterns.

Standard Machine Learning workflow.

Every supervised ML project tends to follow the same familiar journey. You start by gathering raw data, then spend what feels like most of your time shaping it into useful features the X your model will actually learn from. Alongside that, you collect or figure out your labels the Y, the answers you're training the model to predict. Once you have both, you train your model on most of the data, hold a slice back to honestly test how well it generalised, and eventually ship it. But deployment isn't the finish line. The real world has a habit of drifting the data coming in six months from now won't look quite like what you trained on, and a model that nobody's watching will quietly degrade.

Unsupervised pipelines follow roughly the same shape, minus the label-wrangling. That part simply doesn't exist which is often why you're using unsupervised methods in the first place. The bigger shift is at evaluation time. There's no ground truth to score against, so instead of computing an error metric, you're sitting with a domain expert, looking at the clusters or representations the model found, and asking: does this actually mean something? Is this useful? It's a more ambiguous, more human kind of validation and honestly, often the most interesting part of the work.

Top comments (0)