DEV Community

Rupak Biswas
Rupak Biswas

Posted on • Edited on

Demonstrating K means Clustering on Iris Dataset

K-means clustering was performed to evaluate the possible clusters can be derived from the features of the given dataset hence giving the unsupervised model. The following explanatory variables were included as possible contributors to a K-means Clustering model (output) includes the petal length & petal width.

Python Code

iris = load_iris(as_frame=True)
iris.data
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

X = iris.data
X_act = iris.data 
X = X.drop(['sepal length (cm)','sepal width (cm)'],axis=1)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X[['petal length (cm)']])
X['Scaled_PL'] = scaler.transform(X[['petal length (cm)']])
scaler.fit(X[['petal width (cm)']])
X['Scaled_PW'] = scaler.transform(X[['petal width (cm)']])

X = X.drop(['petal length (cm)','petal width (cm)'],axis=1)

plt.scatter(X['Scaled_PL'],X['Scaled_PW'])
Enter fullscreen mode Exit fullscreen mode

Scatter Plot
Scatter Plot Showing the possible cluster between petal length and petal width


Output

from sklearn.cluster import KMeans
model = KMeans(n_clusters = 2)
model.fit(X)

predictions = model.predict(X)
X['clusters'] = predictions
X
Enter fullscreen mode Exit fullscreen mode

list
Prediction shows cluster number (Here we have 2)


cluster0 = X[['Scaled_PL','Scaled_PW']][X.clusters == 0]
cluster1 = X[['Scaled_PL','Scaled_PW']][X.clusters == 1]
centroids = model.cluster_centers_
plt.scatter(cluster0['Scaled_PL'],cluster0['Scaled_PW'],color="yellow",label="Cluster1")
plt.scatter(cluster1['Scaled_PL'],cluster1['Scaled_PW'],color="orange",label="Cluster2")
plt.scatter(centroids[:,0],centroids[:,1],marker="*",color="purple",label="centroid")
plt.xlabel("petal length")
plt.ylabel("petal width")
plt.legend()
Enter fullscreen mode Exit fullscreen mode

predicted plot
Scatter Plot representing the 2 predicted cluster along with it's centroids


Finding Elbow

SSE = []

for i in range(1,11):
    test_model = KMeans(n_clusters=i)
    test_model.fit(X[['Scaled_PL','Scaled_PW']])
    SSE.append(test_model.inertia_)

plt.plot(SSE)
plt.xlabel("K")
plt.ylabel("SSE")
Enter fullscreen mode Exit fullscreen mode

elbow plot
Possible elbow found for the predicted Kmeans model


Here We get the elbow curve at nearly 1 to 5 range (No.of clusters). To get accurate predictions we should put the K value between 1 to 5

Top comments (0)