DEV Community

Cover image for UNSUPERVISED LEARNING
mungaime-25
mungaime-25

Posted on

UNSUPERVISED LEARNING

INTRODUCTION TO UNSUPERVISED LEARNING

It is a type of machine learning where the model is not given any labels. Instead, it tries to find patterns, structures, or relationships in the input data without any human supervision.

main characteristics of unsupervised learning

  1. No labeled outputs.
  2. The system learns patterns from raw data.
  3. Focuses on data exploration and dimensionality reduction.

TYPES OF UNSUPERVISED LEARNING
There are two main types of unsupervised learning

  1. CLUSTERING Clustering is the process of grouping similar data points together such that:
  2. Points in the same cluster are very similar.
  3. Points in different clusters are very different.

  4. DIMENSIONALITY REDUCTION.
    Reducing the number of input variables while preserving key information (e.g., PCA, t-SNE).

Common Clustering Algorithms

_1. K-Means Clustering
_

K: number of clusters to form

  • Algorithm tries to find K centroids (central points)
  • Assigns each data point to the nearest centroid
from sklearn.cluster import KMeans
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Income': [45, 54, 67, 120, 130, 150],
    'Spending': [50, 60, 65, 90, 85, 95]
})

kmeans = KMeans(n_clusters=2)
kmeans.fit(data)

print("Cluster centers:\n", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)
Enter fullscreen mode Exit fullscreen mode

Hierarchical Clustering

  • Doesn’t require you to specify the number of clusters
  • Creates a tree of clusters (dendrogram)
  • You can "cut" the tree at any level to decide how many clusters you want

Types:
Agglomerative (Bottom-Up): Start with individual points and merge them
Divisive (Top-Down): Start with one cluster and split

import matplotlib.pyplot as plt
import pandas as pd
from scipy.cluster.hierarchy import dendrogram,linkage
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

# Sample data
data = pd.DataFrame({
    'Age': [25, 30, 45, 35, 50, 23, 40, 60],
    'Income': [30000, 40000, 50000, 45000, 80000, 32000, 60000, 90000]
})

link = linkage(data, method= 'ward')

#plotting
plt.figure(figsize=(10,6))
dendrogram(link,labels=range(1,len(data)+1),orientation='top', distance_sort= 'ascending',show_leaf_counts= True)
plt.title('hierarchical dendrogram')
plt.xlabel('datapoint')
plt.show()

# Apply Agglomerative Clustering with 3 clusters
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
data['Cluster'] = model.fit_predict(data)

# Visualize
plt.scatter(data['Age'], data['Income'], c=data['Cluster'], cmap='Accent')
plt.xlabel('Age')
plt.ylabel('Income')
plt.title('Agglomerative Clustering')
plt.show()

# Standardize
sl = StandardScaler()
scaled_data = sl.fit_transform(data[['Age', 'Income']])

# Cluster
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
labels = model.fit_predict(scaled_data)

# evaluating
score = silhouette_score(scaled_data, labels)
print(f'Silhouette Score: {score:.4f}')

Enter fullscreen mode Exit fullscreen mode

Dimensionality Reduction – Finding Simplicity in Complexity

Principal Component Analysis (PCA)

  • Reduces many variables into fewer that still capture most of the information.
  • Helps visualize high-dimensional data in 2D or 3D.
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data

pca = PCA(n_components=2)
reduced = pca.fit_transform(X)

print("Reduced shape:", reduced.shape)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)