<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yannawut Kimnaruk</title>
    <description>The latest articles on DEV Community by Yannawut Kimnaruk (@yannawut).</description>
    <link>https://dev.to/yannawut</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F904795%2F2229f3aa-2756-4d75-8363-061d353fe6b9.png</url>
      <title>DEV Community: Yannawut Kimnaruk</title>
      <link>https://dev.to/yannawut</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yannawut"/>
    <language>en</language>
    <item>
      <title>Clustering in PowerBI</title>
      <dc:creator>Yannawut Kimnaruk</dc:creator>
      <pubDate>Sun, 07 Aug 2022 14:08:00 +0000</pubDate>
      <link>https://dev.to/yannawut/clustering-in-powerbi-1h82</link>
      <guid>https://dev.to/yannawut/clustering-in-powerbi-1h82</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--asqON8ZL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AXtrRuaW1iMXqo7mq5d9lCA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--asqON8ZL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AXtrRuaW1iMXqo7mq5d9lCA.png" alt="" width="800" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clustering Example&lt;/p&gt;

&lt;h3&gt;
  
  
  ❓ What is clustering?
&lt;/h3&gt;

&lt;p&gt;Clustering is the method of identifying similar groups of data in a dataset in such a way that objects in the same group (called a cluster) have the same property.&lt;/p&gt;

&lt;p&gt;Clustering is unsupervised learning since a label is not required for each object.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases&lt;/strong&gt; for clustering include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Anomaly detection such as fraud detection, detecting defective mechanical parts&lt;/li&gt;
&lt;li&gt;  Customer segmentation for marketing purposes&lt;/li&gt;
&lt;li&gt;  Rideshare data analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🛣️ Clustering methods
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Auto clustering in Power BI&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;  2-dimension: Scatter plot&lt;/li&gt;
&lt;li&gt;  Multi-dimension: Table&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Python/R&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;  Visualization&lt;/li&gt;
&lt;li&gt;  Transformation&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  📥 Get data
&lt;/h3&gt;

&lt;p&gt;The dataset I will use is the Mall Customer Segmentation Data. It contains basic data about shop customers like Customer ID, age, gender, annual income, and spending score.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python/download?datasetVersionNumber=1"&gt;&lt;strong&gt;Kaggle: Your Home for Data Science&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
_Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…_www.kaggle.com&lt;/a&gt;&lt;a href="https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python/download?datasetVersionNumber=1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After downloading data, get csv data to Power BI.&lt;/p&gt;

&lt;p&gt;Next, you will see 3 methods for clustering in Power BI&lt;/p&gt;

&lt;h3&gt;
  
  
  🧩 Method 1: Auto clustering in Power BI
&lt;/h3&gt;

&lt;p&gt;This method is the easiest one but it comes with some limitations.&lt;/p&gt;

&lt;p&gt;First, let’s see how to perform clustering for 2 parameters/dimensions&lt;/p&gt;

&lt;h4&gt;
  
  
  2-dimension: Scatter plot
&lt;/h4&gt;

&lt;p&gt;In the Visualizations plain on the right-hand side, click on a scatter plot icon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J4SWrdxC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ATFAe6qfgEyQv_P6wt_4naA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J4SWrdxC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ATFAe6qfgEyQv_P6wt_4naA.png" alt="" width="280" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drag 3 parameters to visualization fields as shown below (The Values is required for clustering!!). I will cluster data based on Age and Annual Income in this example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wmYN9Qo0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AcVqh3BW0qViOmVdlN1RagQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wmYN9Qo0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AcVqh3BW0qViOmVdlN1RagQ.png" alt="" width="175" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A scatter plot will be generated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BCV88q5Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A7pqYZy5dC3Q8_G-4nwemjA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BCV88q5Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A7pqYZy5dC3Q8_G-4nwemjA.png" alt="" width="800" height="557"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on the 3-dot icon on the corner of the scatter chart (usually at the upper right corner) and select &lt;strong&gt;Automatically find clusters&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AQj8LrWH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AdIH7FnUd-InzheRSFOIPDA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AQj8LrWH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AdIH7FnUd-InzheRSFOIPDA.png" alt="" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the pop-up box, you can name your cluster and select the number of clusters. I will not select the number of cluster in this case and let Power BI does the job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0gEhq00k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AIjkAJBvzNlnqCnDdchc32A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0gEhq00k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AIjkAJBvzNlnqCnDdchc32A.png" alt="" width="744" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data is divided into 3 clusters which Power BI think is the best number of clusters. Each cluster is illustrated by its color. You can see that customers in the same cluster are plotted close to each other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--asqON8ZL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AXtrRuaW1iMXqo7mq5d9lCA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--asqON8ZL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AXtrRuaW1iMXqo7mq5d9lCA.png" alt="" width="800" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The new cluster parameter is automatically created in the Legend field. You can also use this parameter for further analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pDHK-LRI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AZHTVkWNUP08cg8-m1xgnjA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pDHK-LRI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AZHTVkWNUP08cg8-m1xgnjA.png" alt="" width="273" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Then, what if you want to cluster more than 2 parameters/dimensions. You may not be able to visualize it like a scatter plot but you can do it in the Table.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Multi-dimension: Table
&lt;/h4&gt;

&lt;p&gt;Firstly, click on the table icon in the Visualizations plain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gwK7dUL_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AMIcClRqiSBVb-hIy7UFeDQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gwK7dUL_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AMIcClRqiSBVb-hIy7UFeDQ.png" alt="" width="280" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drag the parameters you want to cluster to the Values field.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zta-kNmB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AZ1VWsnB3yuEDc8xaM0xfuQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zta-kNmB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AZ1VWsnB3yuEDc8xaM0xfuQ.png" alt="" width="179" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A table will be created.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WbcHOuOy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ABt86CGyX-HF5-bcdy0lqrQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WbcHOuOy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ABt86CGyX-HF5-bcdy0lqrQ.png" alt="" width="363" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repeat the same step as the scatter plot (&lt;strong&gt;Automatically find clusters)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EGYr8oh7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A_w_67DgkC-ZdkutZwlmazg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EGYr8oh7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A_w_67DgkC-ZdkutZwlmazg.png" alt="" width="470" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Done!! Clustering is completed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Limitations
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;  Clusters don’t update on data refresh. New data will go into a blank cluster.&lt;/li&gt;
&lt;li&gt;  You have to deal with missing values and scale the values so that each parameter has the same range.&lt;/li&gt;
&lt;li&gt;  The clustering algorithm used in Power BI is the &lt;a href="https://docs.microsoft.com/en-us/analysis-services/data-mining/microsoft-clustering-algorithm-technical-reference?view=asallproducts-allversions"&gt;scalable EM algorithm&lt;/a&gt; (Thank you &lt;a href="https://medium.com/u/e1fa33879a95"&gt;Calvin Nurge&lt;/a&gt; for providing the reference). It may not perform well on some type of data and you can’t adjust it.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🤖 Method 2: Python/R
&lt;/h3&gt;

&lt;p&gt;This method may be more complex but more flexible. You can write Python or R to perform clustering any way you want.&lt;/p&gt;

&lt;p&gt;With this method, The cluster can be refreshed when there is new data and you can adjust the clustering algorithm.&lt;/p&gt;

&lt;p&gt;In this article, I will show you only the Python method. However, R implementation does not much differ from Python.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can read how to use Python in Power BI below.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://medium.com/mlearning-ai/python-in-power-bi-66a80590ecc0"&gt;&lt;strong&gt;Python in Power BI&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
_Increase the power of the Power BI dashboard by integrating Python. Step by step to use Python to acquire, transform, and…_medium.com&lt;/a&gt;&lt;a href="https://medium.com/mlearning-ai/python-in-power-bi-66a80590ecc0"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you finish setting Python, Let’s start clustering.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are 2 ways to perform clustering with Python: Visualization and Transformation.&lt;/p&gt;

&lt;h3&gt;
  
  
  📊 Visualization
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Using Python visualization will create a graph in the dashboard. With this method, you will have a clustered graph and can adjust this graph with Python code but you &lt;strong&gt;can’t use cluster anywhere else&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the Visualization plain, click the Py icon (abbreviation of Python).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WedfBvBq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A8Zez8pt1qniWOh_SFeu7lA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WedfBvBq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A8Zez8pt1qniWOh_SFeu7lA.png" alt="" width="282" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see an empty Python script editor area. Select the columns you want to visualize (Annual Income and Spending Score in this example).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--k86YbcbJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A8TLgMq34xslxY7RpLoUb4A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k86YbcbJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A8TLgMq34xslxY7RpLoUb4A.png" alt="" width="179" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see an empty coding area. Write the code below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Import libraries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;from sklearn.cluster import KMeans&lt;/p&gt;

&lt;p&gt;import matplotlib.pyplot as plt&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Perform K-mean clustering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Perform clustering on Annual Income and Spending Score.&lt;/p&gt;

&lt;p&gt;Divide into 5 clusters.&lt;/p&gt;

&lt;p&gt;Use fit_predict to perform clustering.&lt;/p&gt;

&lt;p&gt;X = dataset[['Annual Income (k$)','Spending Score (1-100)']]&lt;/p&gt;

&lt;p&gt;kmeansmodel = KMeans(n_clusters= 5, init='k-means++', random_state=0)&lt;/p&gt;

&lt;p&gt;y_kmeans = kmeansmodel.fit_predict(X)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Visualization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Creating a scatter plot and color by cluster.&lt;/p&gt;

&lt;p&gt;plt.scatter(X.iloc[y_kmeans == 0, 0], X.iloc[y_kmeans == 0, 1], s = 100, c = 'tomato', label = 'Cluster 1')&lt;/p&gt;

&lt;p&gt;plt.scatter(X.iloc[y_kmeans == 1, 0], X.iloc[y_kmeans == 1, 1], s = 100, c = 'dodgerblue', label = 'Cluster 2')&lt;/p&gt;

&lt;p&gt;plt.scatter(X.iloc[y_kmeans == 2, 0], X.iloc[y_kmeans == 2, 1], s = 100, c = 'palegreen', label = 'Cluster 3')&lt;/p&gt;

&lt;p&gt;plt.scatter(X.iloc[y_kmeans == 3, 0], X.iloc[y_kmeans == 3, 1], s = 100, c = 'violet', label = 'Cluster 4')&lt;/p&gt;

&lt;p&gt;plt.scatter(X.iloc[y_kmeans == 4, 0], X.iloc[y_kmeans == 4, 1], s = 100, c = 'sandybrown', label = 'Cluster 5')&lt;/p&gt;

&lt;p&gt;plt.scatter(kmeansmodel.cluster_centers_[:, 0], kmeansmodel.cluster_centers_[:, 1], s = 300, c = 'yellow', label = 'Centroids')&lt;/p&gt;

&lt;p&gt;plt.title('Clusters of customers')&lt;/p&gt;

&lt;p&gt;plt.xlabel('Annual Income (k$)')&lt;/p&gt;

&lt;p&gt;plt.ylabel('Spending Score (1-100)')&lt;/p&gt;

&lt;p&gt;plt.legend()&lt;/p&gt;

&lt;p&gt;plt.show()&lt;/p&gt;

&lt;p&gt;The graph is illustrated below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OYBan15R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ArMMnFljthoUM9h0B7vyX7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OYBan15R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ArMMnFljthoUM9h0B7vyX7g.png" alt="" width="494" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This graph will be updated once there is new data.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔄 Transformation
&lt;/h3&gt;

&lt;p&gt;This method is the most flexible one. You will perform clustering in the data transformation step and you can use the generated cluster in the dashboard.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Click Transform data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ocSDbKHH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ApqIBHqhq8D7-a8slrRCWHw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ocSDbKHH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2ApqIBHqhq8D7-a8slrRCWHw.png" alt="" width="625" height="126"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select the query you want to transform&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FXYOmEQR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AuTuKcpyCVRWx2jNPPpNZlA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FXYOmEQR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AuTuKcpyCVRWx2jNPPpNZlA.png" alt="" width="306" height="95"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Transform tab, click Run Python script&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JJJuKIGD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AQjUQNCtrx65gpCVe55pcJA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JJJuKIGD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AQjUQNCtrx65gpCVe55pcJA.png" alt="" width="800" height="79"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You will see a new Run Python script window. Copy the code below and click ok.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;from sklearn.cluster import KMeans&lt;br&gt;&lt;br&gt;
import matplotlib.pyplot as plt&lt;/p&gt;

&lt;p&gt;X = dataset[['Annual Income (k$)','Spending Score (1-100)']]&lt;br&gt;&lt;br&gt;
kmeansmodel = KMeans(n_clusters= 5, init='k-means++', random_state=0)&lt;br&gt;&lt;br&gt;
dataset['Cluster'] = kmeansmodel.fit_predict(X)&lt;/p&gt;

&lt;p&gt;This code will run K-mean and create 5 clusters. A new column ‘Cluster’ will be created to store the generated cluster.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The result will be a table. Click to expand the table. Make sure that ‘Use original column name as prefix’ is not checked.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Oa-X3h2Y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A92u8FUBC8fB-Z51XSa_uKQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Oa-X3h2Y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2A92u8FUBC8fB-Z51XSa_uKQ.png" alt="" width="589" height="121"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mNWY_rWw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AGZP-Lg0TnHM4dM5GslNbnQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mNWY_rWw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2AGZP-Lg0TnHM4dM5GslNbnQ.png" alt="" width="376" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Yeah!! The Cluster column is created. This column is range from 0–4 which is 5 clusters specified in step 4.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dJ3nvwXS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2Akl0SR33AUJ2Io3mkd9wLhw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dJ3nvwXS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2Akl0SR33AUJ2Io3mkd9wLhw.png" alt="" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click Close &amp;amp; Apply in the Home tab.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tQRMD888--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2Aq6aGGc6UTzEF2M1WL-Twtw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tQRMD888--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/800/1%2Aq6aGGc6UTzEF2M1WL-Twtw.png" alt="" width="671" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚠️ Make sure that after clustering, you change the column to the appropriate type.&lt;/p&gt;

&lt;p&gt;The Cluster column will automatically refresh when you refresh the data.&lt;/p&gt;

&lt;p&gt;You can use this Cluster parameter anywhere in the dashboard. Mostly, it will be used as a legend or filter.&lt;/p&gt;




&lt;p&gt;This article is quite long since I want to cover all clustering methods in Power BI.&lt;/p&gt;

&lt;p&gt;Hope you enjoy reading this article. Please follow me for more data analytic content.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>powerbi</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
