DEV Community

Trix Cyrus
Trix Cyrus

Posted on

3 1 1 1 1

Part 3: Building Your Own AI - Understanding the Basics of Machine Learning Algorithms

Author: Trix Cyrus

Try My, Waymap Pentesting tool: Click Here
TrixSec Github: Click Here
TrixSec Telegram: Click Here


Machine Learning (ML) algorithms form the backbone of AI systems. In this article, we’ll break down the fundamental types of ML algorithms—supervised, unsupervised, and reinforcement learning—exploring how they work, their applications, and the key steps involved in preparing datasets for these models.


1. Types of Machine Learning Algorithms

a. Supervised Learning

Supervised learning involves training a model using labeled data, where both input and output are known. The model learns to map inputs to the correct output, making predictions for new data.

  • Examples of Algorithms:

    • Linear Regression: Predicts a continuous value based on input features.
    • Logistic Regression: Used for binary classification tasks (e.g., spam vs. not spam).
    • K-Nearest Neighbors (KNN): Classifies data points based on their similarity to neighbors.
  • Applications:

    • Spam detection in emails.
    • Predicting house prices based on features like size and location.

b. Unsupervised Learning

In unsupervised learning, the model works with unlabeled data, identifying patterns, clusters, or structures without explicit guidance.

  • Examples of Algorithms:

    • Clustering (e.g., K-Means): Groups similar data points into clusters.
    • Dimensionality Reduction (e.g., PCA): Reduces the number of features while retaining the essential information.
  • Applications:

    • Customer segmentation in marketing.
    • Anomaly detection in fraud detection.

c. Reinforcement Learning

Reinforcement learning (RL) is a feedback-based learning approach where an agent learns to make decisions by interacting with an environment to maximize rewards.

  • Key Components:

    • Agent: The decision-maker.
    • Environment: The space the agent interacts with.
    • Reward: Feedback to guide the agent’s actions.
  • Examples of Algorithms:

    • Q-Learning.
    • Deep Q-Networks (DQN).
  • Applications:

    • Game AI (e.g., AlphaGo).
    • Robotics and autonomous vehicles.

2. Common Algorithms and How They Work

Linear Regression (Supervised)

Linear regression predicts a continuous output (e.g., sales or temperature).

  • Formula:

    [
    y = mx + b
    ]

    Where:

    • ( y ): Predicted value.
    • ( x ): Input feature.
    • ( m ): Slope (weight).
    • ( b ): Intercept (bias).
  • Example:

    Predicting house prices based on square footage.


K-Nearest Neighbors (KNN) (Supervised)

KNN is a classification algorithm that assigns a data point to the class most common among its ( k ) nearest neighbors.

  • Steps:

    1. Calculate the distance between data points (e.g., Euclidean distance).
    2. Identify the ( k ) nearest neighbors.
    3. Assign the majority class label.
  • Example:
    Classifying whether a flower is Iris-setosa or Iris-versicolor based on petal length and width.


Clustering with K-Means (Unsupervised)

K-Means groups data points into ( k ) clusters based on their similarity.

  • Steps:

    1. Initialize ( k ) cluster centroids randomly.
    2. Assign each data point to the nearest centroid.
    3. Update centroids based on the mean of assigned points.
    4. Repeat until convergence.
  • Example:
    Segmenting customers based on purchasing behavior.


3. Preparing Datasets for ML Models

a. Data Collection

Gather relevant and high-quality data. Use sources like APIs, web scraping, or public datasets (e.g., Kaggle, UCI ML Repository).


b. Data Cleaning

Remove irrelevant or corrupted data. Steps include:

  • Handling missing values (e.g., mean imputation).
  • Removing duplicates.
  • Converting categorical data into numerical format (e.g., one-hot encoding).

c. Data Splitting

Divide the dataset into:

  • Training Set (70–80%): Used to train the model.
  • Test Set (20–30%): Used to evaluate model performance.
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Enter fullscreen mode Exit fullscreen mode

d. Feature Scaling

Normalize or standardize features to ensure all data is on the same scale.

  • Normalization: Scales data to a range of 0 to 1.
  • Standardization: Centers data around 0 with a standard deviation of 1.
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Enter fullscreen mode Exit fullscreen mode

~Trixsec

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay