UNSUPERVISED LEARNING

INTRODUCTION TO UNSUPERVISED LEARNING

It is a type of machine learning where the model is not given any labels. Instead, it tries to find patterns, structures, or relationships in the input data without any human supervision.

main characteristics of unsupervised learning

No labeled outputs.
The system learns patterns from raw data.
Focuses on data exploration and dimensionality reduction.

TYPES OF UNSUPERVISED LEARNING
There are two main types of unsupervised learning

CLUSTERING Clustering is the process of grouping similar data points together such that:
Points in the same cluster are very similar.
Points in different clusters are very different.
DIMENSIONALITY REDUCTION.
Reducing the number of input variables while preserving key information (e.g., PCA, t-SNE).

Common Clustering Algorithms

_1. K-Means Clustering
_
K: number of clusters to form

Algorithm tries to find K centroids (central points)
Assigns each data point to the nearest centroid

from sklearn.cluster import KMeans
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Income': [45, 54, 67, 120, 130, 150],
    'Spending': [50, 60, 65, 90, 85, 95]
})

kmeans = KMeans(n_clusters=2)
kmeans.fit(data)

print("Cluster centers:\n", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)

Hierarchical Clustering

Doesn’t require you to specify the number of clusters
Creates a tree of clusters (dendrogram)
You can "cut" the tree at any level to decide how many clusters you want

Types:
Agglomerative (Bottom-Up): Start with individual points and merge them
Divisive (Top-Down): Start with one cluster and split

import matplotlib.pyplot as plt
import pandas as pd
from scipy.cluster.hierarchy import dendrogram,linkage
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

# Sample data
data = pd.DataFrame({
    'Age': [25, 30, 45, 35, 50, 23, 40, 60],
    'Income': [30000, 40000, 50000, 45000, 80000, 32000, 60000, 90000]
})

link = linkage(data, method= 'ward')

#plotting
plt.figure(figsize=(10,6))
dendrogram(link,labels=range(1,len(data)+1),orientation='top', distance_sort= 'ascending',show_leaf_counts= True)
plt.title('hierarchical dendrogram')
plt.xlabel('datapoint')
plt.show()

# Apply Agglomerative Clustering with 3 clusters
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
data['Cluster'] = model.fit_predict(data)

# Visualize
plt.scatter(data['Age'], data['Income'], c=data['Cluster'], cmap='Accent')
plt.xlabel('Age')
plt.ylabel('Income')
plt.title('Agglomerative Clustering')
plt.show()

# Standardize
sl = StandardScaler()
scaled_data = sl.fit_transform(data[['Age', 'Income']])

# Cluster
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
labels = model.fit_predict(scaled_data)

# evaluating
score = silhouette_score(scaled_data, labels)
print(f'Silhouette Score: {score:.4f}')

Dimensionality Reduction – Finding Simplicity in Complexity

Principal Component Analysis (PCA)

Reduces many variables into fewer that still capture most of the information.
Helps visualize high-dimensional data in 2D or 3D.

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data

pca = PCA(n_components=2)
reduced = pca.fit_transform(X)

print("Reduced shape:", reduced.shape)

DEV Community

UNSUPERVISED LEARNING

Top comments (0)