K-means clustering was performed to evaluate the possible clusters can be derived from the features of the given dataset hence giving the unsupervised model. The following explanatory variables were included as possible contributors to a K-means Clustering model (output) includes the petal length & petal width.
Python Code
iris = load_iris(as_frame=True)
iris.data
iris.feature_names
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']
X = iris.data
X_act = iris.data
X = X.drop(['sepal length (cm)','sepal width (cm)'],axis=1)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X[['petal length (cm)']])
X['Scaled_PL'] = scaler.transform(X[['petal length (cm)']])
scaler.fit(X[['petal width (cm)']])
X['Scaled_PW'] = scaler.transform(X[['petal width (cm)']])
X = X.drop(['petal length (cm)','petal width (cm)'],axis=1)
plt.scatter(X['Scaled_PL'],X['Scaled_PW'])
Scatter Plot Showing the possible cluster between petal length and petal width
Output
from sklearn.cluster import KMeans
model = KMeans(n_clusters = 2)
model.fit(X)
predictions = model.predict(X)
X['clusters'] = predictions
X
Prediction shows cluster number (Here we have 2)
cluster0 = X[['Scaled_PL','Scaled_PW']][X.clusters == 0]
cluster1 = X[['Scaled_PL','Scaled_PW']][X.clusters == 1]
centroids = model.cluster_centers_
plt.scatter(cluster0['Scaled_PL'],cluster0['Scaled_PW'],color="yellow",label="Cluster1")
plt.scatter(cluster1['Scaled_PL'],cluster1['Scaled_PW'],color="orange",label="Cluster2")
plt.scatter(centroids[:,0],centroids[:,1],marker="*",color="purple",label="centroid")
plt.xlabel("petal length")
plt.ylabel("petal width")
plt.legend()
Scatter Plot representing the 2 predicted cluster along with it's centroids
Finding Elbow
SSE = []
for i in range(1,11):
test_model = KMeans(n_clusters=i)
test_model.fit(X[['Scaled_PL','Scaled_PW']])
SSE.append(test_model.inertia_)
plt.plot(SSE)
plt.xlabel("K")
plt.ylabel("SSE")
Possible elbow found for the predicted Kmeans model
Here We get the elbow curve at nearly 1 to 5 range (No.of clusters). To get accurate predictions we should put the K value between 1 to 5
Top comments (0)