DEV Community

Trix Cyrus
Trix Cyrus

Posted on

Part 3: Building Your Own AI - Understanding the Basics of Machine Learning Algorithms

Author: Trix Cyrus

Try My, Waymap Pentesting tool: Click Here
TrixSec Github: Click Here
TrixSec Telegram: Click Here


Machine Learning (ML) algorithms form the backbone of AI systems. In this article, we’ll break down the fundamental types of ML algorithms—supervised, unsupervised, and reinforcement learning—exploring how they work, their applications, and the key steps involved in preparing datasets for these models.


1. Types of Machine Learning Algorithms

a. Supervised Learning

Supervised learning involves training a model using labeled data, where both input and output are known. The model learns to map inputs to the correct output, making predictions for new data.

  • Examples of Algorithms:

    • Linear Regression: Predicts a continuous value based on input features.
    • Logistic Regression: Used for binary classification tasks (e.g., spam vs. not spam).
    • K-Nearest Neighbors (KNN): Classifies data points based on their similarity to neighbors.
  • Applications:

    • Spam detection in emails.
    • Predicting house prices based on features like size and location.

b. Unsupervised Learning

In unsupervised learning, the model works with unlabeled data, identifying patterns, clusters, or structures without explicit guidance.

  • Examples of Algorithms:

    • Clustering (e.g., K-Means): Groups similar data points into clusters.
    • Dimensionality Reduction (e.g., PCA): Reduces the number of features while retaining the essential information.
  • Applications:

    • Customer segmentation in marketing.
    • Anomaly detection in fraud detection.

c. Reinforcement Learning

Reinforcement learning (RL) is a feedback-based learning approach where an agent learns to make decisions by interacting with an environment to maximize rewards.

  • Key Components:

    • Agent: The decision-maker.
    • Environment: The space the agent interacts with.
    • Reward: Feedback to guide the agent’s actions.
  • Examples of Algorithms:

    • Q-Learning.
    • Deep Q-Networks (DQN).
  • Applications:

    • Game AI (e.g., AlphaGo).
    • Robotics and autonomous vehicles.

2. Common Algorithms and How They Work

Linear Regression (Supervised)

Linear regression predicts a continuous output (e.g., sales or temperature).

  • Formula:

    [
    y = mx + b
    ]

    Where:

    • ( y ): Predicted value.
    • ( x ): Input feature.
    • ( m ): Slope (weight).
    • ( b ): Intercept (bias).
  • Example:

    Predicting house prices based on square footage.


K-Nearest Neighbors (KNN) (Supervised)

KNN is a classification algorithm that assigns a data point to the class most common among its ( k ) nearest neighbors.

  • Steps:

    1. Calculate the distance between data points (e.g., Euclidean distance).
    2. Identify the ( k ) nearest neighbors.
    3. Assign the majority class label.
  • Example:
    Classifying whether a flower is Iris-setosa or Iris-versicolor based on petal length and width.


Clustering with K-Means (Unsupervised)

K-Means groups data points into ( k ) clusters based on their similarity.

  • Steps:

    1. Initialize ( k ) cluster centroids randomly.
    2. Assign each data point to the nearest centroid.
    3. Update centroids based on the mean of assigned points.
    4. Repeat until convergence.
  • Example:
    Segmenting customers based on purchasing behavior.


3. Preparing Datasets for ML Models

a. Data Collection

Gather relevant and high-quality data. Use sources like APIs, web scraping, or public datasets (e.g., Kaggle, UCI ML Repository).


b. Data Cleaning

Remove irrelevant or corrupted data. Steps include:

  • Handling missing values (e.g., mean imputation).
  • Removing duplicates.
  • Converting categorical data into numerical format (e.g., one-hot encoding).

c. Data Splitting

Divide the dataset into:

  • Training Set (70–80%): Used to train the model.
  • Test Set (20–30%): Used to evaluate model performance.
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Enter fullscreen mode Exit fullscreen mode

d. Feature Scaling

Normalize or standardize features to ensure all data is on the same scale.

  • Normalization: Scales data to a range of 0 to 1.
  • Standardization: Centers data around 0 with a standard deviation of 1.
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Enter fullscreen mode Exit fullscreen mode

~Trixsec

Top comments (0)