DEV Community

Cover image for Introduction to Machine Learning
Madhav Ganesan
Madhav Ganesan

Posted on • Edited on

Introduction to Machine Learning

Artificial Intelligence

Any technique that makes a computer behave in a way that seems intelligent — pattern recognition, decision making, language understanding, recommendations.

Machine Learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns within datasets, allowing them to make predictions on new, similar data without explicit programming for each task.

Types of learning

1) Supervised Learning

It is an approach where a model is trained on labeled data, meaning each input has a corresponding correct output, so the model learns to map inputs to outputs.

Algorithms
1) Classification:
Predicting a category label, such as spam detection in emails.

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • Support Vector Machine (SVM)
  • K-Nearest Neighbors (KNN)
  • Naive Bayes
  • Gradient Boosting (XGBoost, LightGBM, CatBoost)
  • Neural Networks

2) Regression:
Predicting a continuous value, such as house prices.

  • Linear Regression
  • Ridge / Lasso Regression
  • Decision Tree Regression
  • Random Forest Regression
  • SVR (Support Vector Regression)

Examples
Identifying the zip code from handwritten digits on an envelope
Determining whether a tumor is benign based on a medical image
Detecting fraudulent activity in credit card transactions

2) Un-supervised Learning

It is an approach where a model is trained on unlabeled data and tries to discover patterns, structures, or relationships within the data.

Algorithms

1) Clustering:
Grouping similar data points together

  • K-Means
  • DBSCAN
  • Hierarchical Clustering
  • Gaussian Mixture Models Ex. Customer segmentation

2) Association:
Finding relationships or rules between variables in large datasets.

  • Apriori Algorithm
  • Eclat Algorithm
  • AIS Algorithm
  • FP-Growth (Frequent Pattern Growth)
  • Ex. People who buy bread also buy butter

3) Dimensionality Reduction:
Reducing the number of features (variables) while keeping important information.

  • PCA (Principal Component Analysis)
  • t-SNE
  • UMAP
  • Autoencoders Ex. Compressing image data

4) Anomaly Detection
Process of identifying data points that deviate significantly from the normal pattern in a dataset.

  • Isolation Forest
  • One-Class SVM
  • Local Outlier Factor (LOF)

3) Semi-Supervised Learning

It is an approach that uses a combination of a small amount of labeled data and a large amount of unlabeled data to improve learning accuracy.

  • Label Propagation
  • Self-Training
  • Co-Training

4) Reinforcement Learning

It involves training a model to make sequences of decisions by rewarding desired behaviors and punishing undesired ones

Game Playing: Training an AI to play games like chess or Go.
Robotics: Training robots to perform tasks, such as walking or grasping objects.
Self-driving Cars: Training autonomous vehicles to navigate roads safely.

Common Algorithms
Q-Learning
Deep Q-Network (DQN)
Policy Gradient Methods
Actor-Critic Methods

Transformer

Transformers are a type of model architecture designed for processing sequential data, like text, using mechanisms like self-attention. They excel in capturing long-range dependencies and parallel processing. They are a type of model architecture and are not inherently tied to a specific learning paradigm like supervised, unsupervised, or reinforcement learning. Instead, they can be used within any of these paradigms depending on the specific task they are applied to.

Large Language model(LLM)

Large Language Models are AI models trained to handle various natural language processing (NLP) tasks by learning patterns from large datasets of text. They leverage deep learning techniques, often based on transformer architectures, to understand and generate human-like text.

Examples of LLMs
GPT-4 (Generative Pre-trained Transformer 4)
BERT (Bidirectional Encoder Representations from Transformers)
T5 (Text-To-Text Transfer Transformer)

How LLMs Are Built
1) Data Collection:
LLMs are trained on vast and diverse datasets that include books, articles, websites, and other text sources.

2) Model Architecture:
Most modern LLMs use transformer architectures, which rely on attention mechanisms to process and generate text.

3) Pre-training:

Objective: The model is trained on large amounts of text data using unsupervised learning techniques. Common objectives include predicting the next word in a sentence or filling in missing words.
Techniques: Examples include masked language modeling (MLM) for BERT and autoregressive modeling for GPT.

4) Fine-tuning:

Objective: The pre-trained model is further trained on specific tasks or datasets to improve its performance on particular applications, such as translation or sentiment analysis.

5) Evaluation:

Metrics: The model’s performance is evaluated using metrics such as accuracy, BLEU score (for translation), and F1 score (for classification).

Evaluation metrics

They are crucial in machine learning and artificial intelligence for assessing the performance of models and algorithms. The choice of metrics depends on the type of task (e.g., classification, regression, clustering)

  1. Classification Metrics:

Accuracy
The ratio of correctly predicted observations to the total observations.

Precision
The ratio of correctly predicted positive observations to the total predicted positives.

Recall (Sensitivity or True Positive Rate)
The ratio of correctly predicted positive observations to the total actual positives

F1 Score
The harmonic mean of precision and recall, balancing both metrics.

ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

It measures the model's ability to distinguish between classes. AUC is the area under the ROC curve, which plots true positive rate vs. false positive rate.

Range: 0 to 1 (1 indicates perfect classification).

Bias

The tendency of a model to consistently make errors in a particular direction

High Bias: Leads to underfitting, where the model is too simple to capture the underlying patterns in the data.
Low Bias: Indicates a model that is more flexible and capable of fitting the training data well, but it can still suffer from high variance.

How to measure bias?
A benchmark dataset is a standardized dataset used to evaluate and compare the performance of various algorithms or models within a specific domain. They related to bias are specifically designed to help evaluate and understand fairness and bias in machine learning models.
Ex. StereoSet (Designed to evaluate and address biases in natural language processing models)
CrowS-Pairs (focusing on the perpetuation of stereotypes)

How to remove bias? (Debiasing)

Ex. AutoDebias (Automatically mitigate biases in machine learning models)

Machine Learning Packages

scikit-learn

It is a very popular tool which contains a number of state-of-the-art machine learning algorithms, and the most prominent Python library for machine learning

numpy

It is a fundamental packages for scientific computing in Python. It contains functionality for multidimensional arrays, high-level mathematical functions such as linear algebra operations and the Fourier transform, and pseudo random number generators

matplotlib

It is a primary scientific plotting library in Python. It provides functions for making publication-quality visualizations such as line charts, histograms, scatter plots etc

pandas

pip install numpy scipy matplotlib ipython scikit-learn pandas
Enter fullscreen mode Exit fullscreen mode

First code

# Step 1: Import the necessary libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 2: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 5: Train a K-Nearest Neighbors classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Step 6: Make predictions on the testing set
y_pred = knn.predict(X_test)

# Step 7: Evaluate the classifier
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
Enter fullscreen mode Exit fullscreen mode

Ensemble Model

It is a machine learning technique that combines the predictions of multiple individual models to improve overall performance. The main idea is that by aggregating multiple models, you can achieve better accuracy, robustness, and generalization compared to using any single model alone.

Types of Ensemble Methods

Bagging (Bootstrap Aggregating)

Multiple models (e.g., decision trees) are trained on different bootstrapped subsets of the data. The predictions are aggregated (e.g., by voting for classification or averaging for regression).

It reduces variance and helps in preventing overfitting.

Ex.

Random Forest

Bagged Decision Trees

Boosting

Models are trained sequentially, where each new model tries to correct the errors of the previous ones. The predictions are combined, often with a weighted average.

It reduces bias and improves the accuracy of predictions by focusing on the errors of previous models.

Ex.
AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost.

Stacking

Different models (base learners) are trained on the same data, and their predictions are used as inputs to a meta-learner, which makes the final prediction.

Combines multiple models to leverage their individual strengths and improve performance.

Ex
Using logistic regression as a meta-learner with decision trees and SVMs as base learners.

Voting

For classification, majority voting is used to choose the class with the most votes. For regression, the average of predictions is taken.

Types: Hard voting (majority class) and soft voting (average predicted probabilities).

Random Forest

It is a versatile and powerful machine learning algorithm that's used for both classification and regression tasks.

The algorithm creates multiple decision trees by sampling the training data with replacement (bootstrap sampling). Each tree is trained on a slightly different dataset, which helps in reducing overfitting.

For classification tasks, the final output is determined by majority voting among the individual trees. For regression tasks, the output is the average of the predictions from all trees.

Classification

Decision Trees
Random Forest
Naive Bayes
K-Nearest Neighbors (KNN)
Logistic Regression
Neural Networks
AdaBoost
Gradient Boosting Machines (GBM)
Support Vector Machines (SVM)
Quadratic Discriminant Analysis (QDA)

Regression

Linear Regression
Decision Tree Regression
Random Forest Regression
Bayesian Regression
K-Nearest Neighbors (KNN) Regression
Gradient Boosting Machines (GBM) for Regression

Clustering

K-Medoids
K-Means Clustering
Hierarchical Clustering
Gaussian Mixture Models (GMM)
Agglomerative Clustering
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
OPTICS (Ordering Points To Identify the Clustering Structure)
HDBSCAN (Hierarchical DBSCAN)

Why 255?

In standard grayscale and RGB images, pixel values are often represented as 8-bit integers, which means each pixel value ranges from 0 to 255. This range is derived from the 8-bit depth, where:

0 represents the minimum value (e.g., black in grayscale or full absence of color in RGB).
255 represents the maximum value (e.g., white in grayscale or full intensity of a color in RGB)

Stay Connected!
If you enjoyed this post, don’t forget to follow me on social media for more updates and insights:

Twitter: madhavganesan
Instagram: madhavganesan
LinkedIn: madhavganesan

Top comments (0)