<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aldi Fianda Putra</title>
    <description>The latest articles on DEV Community by Aldi Fianda Putra (@aldifp01).</description>
    <link>https://dev.to/aldifp01</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F688601%2F8ae05e9a-4810-4d25-80c3-339f04c1da99.jpeg</url>
      <title>DEV Community: Aldi Fianda Putra</title>
      <link>https://dev.to/aldifp01</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aldifp01"/>
    <language>en</language>
    <item>
      <title>Automate App Deployment with Cloud Build</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Mon, 02 Jun 2025 03:37:29 +0000</pubDate>
      <link>https://dev.to/aldifp01/automate-app-deployment-with-cloud-build-2cdb</link>
      <guid>https://dev.to/aldifp01/automate-app-deployment-with-cloud-build-2cdb</guid>
      <description>&lt;p&gt;After revisiting the basics of cloud infrastructure in my previous post, I figured it's a good time to talk about something equally essential: CI/CD.&lt;/p&gt;

&lt;p&gt;In the past few years working with Google Cloud, I've had to set up multiple CI/CD pipelines—some simple, some complex. I’ve found that using Cloud Build makes it straightforward to automate deployments and integrate well with other GCP services.&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk through how to build a CI/CD pipeline using Cloud Build, and how it fits into a modern DevOps workflow on GCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Cloud Build?
&lt;/h3&gt;

&lt;p&gt;Cloud Build is a serverless CI/CD platform from Google Cloud that lets you build, test, and deploy your code directly from repositories like GitHub or Cloud Source Repositories. It supports custom build steps using Docker images and integrates smoothly with other GCP services like Cloud Run, GKE, and Artifact Registry.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdexu79l3wc9va2rn1tpr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdexu79l3wc9va2rn1tpr.png" alt="Image description" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up Service Account and Required Roles for Cloud Build Deployment
&lt;/h3&gt;

&lt;p&gt;Before using Cloud Build, you need to create a dedicated service account for it, which can be either the default service account or a custom one. Then, assign the necessary roles based on your build requirements.&lt;/p&gt;

&lt;p&gt;In this post, since I will deploy the application to App Engine — a fully managed platform by Google Cloud that automatically handles infrastructure, scaling, and load balancing for web applications — I assign the following roles to the service account:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;App Engine Deployer for deploying the app&lt;/li&gt;
&lt;li&gt;Storage Admin because the staging process involves Cloud Storage, and&lt;/li&gt;
&lt;li&gt;Logs Writer to allow logging build activities to Cloud Logging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb45xezds3uoqgo1pqhqj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb45xezds3uoqgo1pqhqj.png" alt="Image description" width="800" height="284"&gt;&lt;/a&gt;&lt;br&gt;
After creating the service account, the next step is to set up a connection to our code repository. In this example, I’m connecting to GitLab, so I create a host connection to my GitLab repository.&lt;/p&gt;

&lt;p&gt;Once connected, you will see something like the image above, where the connection name and the linked repository appear in Cloud Build.&lt;/p&gt;
&lt;h3&gt;
  
  
  Creating a CI/CD Trigger in Cloud Build
&lt;/h3&gt;

&lt;p&gt;To create a CI/CD trigger, go to the Cloud Build page and select the Triggers menu, then click "+ Create Trigger".&lt;/p&gt;

&lt;p&gt;On this page, you can specify the event that will invoke the build trigger. These events can be repository-related, such as a push to a branch, a new tag push, or a pull request. Alternatively, the trigger can respond to non-repository events like manual invocation, Pub/Sub messages, or webhook events.&lt;/p&gt;

&lt;p&gt;Next, select the repository to connect. Then, choose the build configuration method—whether based on a Cloud Build config file, a Dockerfile, or Buildpacks. In this example, I will use a Cloud Build configuration file (cloudbuild.yaml).&lt;/p&gt;

&lt;p&gt;As an additional note, you can also add environment variables to the trigger using the Substitution variables option. However, the variable names must include an underscore and follow the format _VARIABLE; using names without the underscore (like VARIABLE) is not allowed.&lt;/p&gt;

&lt;p&gt;Finally, select the service account that you created earlier to run the build.&lt;/p&gt;
&lt;h3&gt;
  
  
  Cloud Build Configuration for Deployment
&lt;/h3&gt;

&lt;p&gt;The build configuration file (cloudbuild.yaml) is stored in the code repository alongside the application source code. This allows Cloud Build to automatically detect and use the configuration whenever a build is triggered.&lt;/p&gt;

&lt;p&gt;Since in this example I am using App Engine to deploy a Node.js application, besides the Cloud Build configuration, there is also an app.yaml file.&lt;/p&gt;

&lt;p&gt;The app.yaml file is the configuration file for App Engine that defines deployment settings such as the runtime environment, instance class, service name, and URL handlers.&lt;/p&gt;

&lt;p&gt;Here is an example of the app.yaml content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;env: standard          # Specifies the App Engine environment (standard or flexible)
runtime: nodejs20      # Defines the runtime environment (Node.js version 20)
instance_class: F2     # Specifies the instance class which controls CPU and memory resources
service: testing       # Names the service/module for this deployment
handlers:              # Defines how URL paths are handled
  - url: /.*           # Matches all URL paths
    secure: always     # Enforces HTTPS for all requests
    script: auto       # Automatically selects the script to run (default behavior for Node.js)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is an example of a Cloud Build configuration (cloudbuild.yaml) that runs several build steps before deploying the app to App Engine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;steps:
  - name: 'node'
    entrypoint: 'yarn'
    args: ['cache', 'clean']
  - name: 'node'
    entrypoint: 'yarn'
    args: ['install']
  - name: 'node'
    entrypoint: 'yarn'
    args: ['build']
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: 'bash'
    args: ['-c', 'gcloud config set app/cloud_build_timeout 1600 &amp;amp;&amp;amp; gcloud app deploy']
timeout: '1600s'
options:
  logging: CLOUD_LOGGING_ONLY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Explanation of each step:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;yarn cache clean clears the local yarn cache to ensure a fresh install of dependencies.&lt;/li&gt;
&lt;li&gt;Install Dependencies:
yarn install installs the project dependencies as defined in package.json.&lt;/li&gt;
&lt;li&gt;yarn build runs the build script, typically compiling or bundling the app for production.&lt;/li&gt;
&lt;li&gt;Uses the Google Cloud SDK container to run a bash command that sets the build timeout and deploys the app with gcloud app deploy.&lt;/li&gt;
&lt;li&gt;The timeout: '1600s' defines the maximum time the build can run before being cancelled (in this case, about 26 minutes).&lt;/li&gt;
&lt;li&gt;The options.logging: CLOUD_LOGGING_ONLY setting ensures that build logs are sent only to Cloud Logging.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Monitoring Build Progress in Cloud Build
&lt;/h3&gt;

&lt;p&gt;You can monitor the progress and status of your builds in the Build History page of Cloud Build. This page provides detailed logs for each build step, allowing you to troubleshoot issues or confirm successful deployments. Each entry shows timestamps, build results, and any error messages if the build fails.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkq93w0mg3gbay9sjch9i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkq93w0mg3gbay9sjch9i.png" alt="Image description" width="800" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz26zxtq5duvtkgle61py.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz26zxtq5duvtkgle61py.png" alt="Image description" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this setup, you now have a complete CI/CD pipeline that automates building and deploying your Node.js application to App Engine. By integrating Cloud Build with your repository and configuring triggers, deployment becomes consistent, repeatable, and easier to manage. In future posts, I plan to explore more advanced topics such as environment variable management, artifact storage, and secret handling to further optimize the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/build/docs/overview" rel="noopener noreferrer"&gt;Cloud Build Documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/build/docs/building/build-nodejs" rel="noopener noreferrer"&gt;Build Node.js application&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cloudcomputing</category>
      <category>googlecloud</category>
      <category>cicd</category>
      <category>devops</category>
    </item>
    <item>
      <title>Landing Zone: A Blueprint for Cloud Deployments</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Tue, 22 Apr 2025 02:31:46 +0000</pubDate>
      <link>https://dev.to/aldifp01/landing-zone-a-blueprint-for-cloud-deployments-48h7</link>
      <guid>https://dev.to/aldifp01/landing-zone-a-blueprint-for-cloud-deployments-48h7</guid>
      <description>&lt;p&gt;After working in the cloud industry for over two years—mostly with Google Cloud Platform—I began to think that I need to start documenting my experiences. Mainly as a personal reference, but hopefully others might find them useful too.&lt;/p&gt;

&lt;p&gt;This is my first post in a long while (the last one was about two years ago). To kick things off, I want to start with something fundamental: Landing Zone in the cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is A Landing Zone?
&lt;/h3&gt;

&lt;p&gt;A Landing Zone is an initial framework or foundation set up in the cloud (such as Google Cloud, AWS, or Azure) to enable organizations to deploy applications and services in a secure, structured, and scalable manner. Landing Zones are dynamic and evolve as your organization adopts more cloud-based workloads over time.&lt;/p&gt;

&lt;p&gt;When creating a Landing Zone, there is no one-size-fits-all implementation. Each organization is free to design its Landing Zone based on specific business needs. The design is usually influenced by several factors, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Industry type&lt;/li&gt;
&lt;li&gt;Organizational structure and internal processes&lt;/li&gt;
&lt;li&gt;Security and compliance requirements&lt;/li&gt;
&lt;li&gt;Types of workloads to be migrated to the cloud&lt;/li&gt;
&lt;li&gt;Existing infrastructure (either on-premises or in another cloud)&lt;/li&gt;
&lt;li&gt;Business and customer locations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgu1ejm04ki2wyf9d7nx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgu1ejm04ki2wyf9d7nx.png" alt="" width="800" height="805"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Sample of Landing Zone Architecture. Image credit: Google Cloud&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The image above shows an example of Landing Zone architecture on Google Cloud. It highlights how different components come together to create a secure, scalable, and well-managed environment for deploying workloads in the cloud.&lt;/p&gt;

&lt;p&gt;The core elements of Landing Zone that you should include when designing a Landing Zone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identity provisioning&lt;/li&gt;
&lt;li&gt;Resource hierarchy&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;Security controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On top of that, you can also add other components based on your business needs, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring and logging&lt;/li&gt;
&lt;li&gt;Backup and disaster recovery&lt;/li&gt;
&lt;li&gt;Compliance&lt;/li&gt;
&lt;li&gt;Cost efficiency and budget control&lt;/li&gt;
&lt;li&gt;API management&lt;/li&gt;
&lt;li&gt;Cluster management&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Automating Landing Zone Creation — Is It Possible?
&lt;/h3&gt;

&lt;p&gt;A Landing Zone, as previously explained, is a framework and foundation for our cloud infrastructure. Based on this foundation, we can set up our environment using whichever method we prefer—such as the GCP Console or Cloud Shell. However, if we need to configure many projects, doing so manually can be time-consuming. To simplify this, we can automate the setup of Landing Zone services using Terraform. To give you a better idea of how this works in practice, here's a simple example of how we can use Terraform to create a basic Landing Zone setup on Google Cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;main.tf&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provider "google" {
  project = var.project_id
  region  = var.region
  zone    = var.zone
}

resource "google_project" "landing_zone_project" {
  name            = var.project_name
  project_id      = var.project_id
  org_id          = var.org_id
  billing_account = var.billing_account_id
}

resource "google_project_service" "enabled_apis" {
  for_each = toset([
    "compute.googleapis.com",
    "iam.googleapis.com",
    "cloudresourcemanager.googleapis.com"
  ])
  project = google_project.landing_zone_project.project_id
  service = each.key
}

resource "google_project_iam_member" "project_owner" {
  project = google_project.landing_zone_project.project_id
  role    = "roles/owner"
  member  = "user:${var.user_email}"
}

resource "google_compute_network" "vpc_network" {
  name                    = "landing-zone-vpc"
  auto_create_subnetworks = true
  project                 = google_project.landing_zone_project.project_id
}

resource "google_compute_instance" "vm_instance" {
  name         = "landing-zone-instance"
  machine_type = "e2-medium"
  zone         = var.zone
  project      = google_project.landing_zone_project.project_id

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  network_interface {
    network = google_compute_network.vpc_network.name
    access_config {
      # Ephemeral external IP
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This main.tf template lays the groundwork for a basic Google Cloud Landing Zone. It starts by configuring the Google provider using project, region, and zone variables. Then, it creates a new GCP project under your organization, connects it to a billing account, and enables a few essential APIs like Compute Engine, IAM, and Cloud Resource Manager. To ensure proper access, it assigns the "Owner" role to a specified user email. After that, it sets up a default VPC network with automatic subnet creation. Finally, it launches a simple VM instance running Debian 11 and connects it to the VPC. This setup gives you a clean starting point to build more infrastructure on GCP. Since dev.to has a content limit, I’m not including the other Terraform files like variables.tf and terraform.tfvars here.&lt;/p&gt;

&lt;p&gt;That’s a wrap! This setup might be simple, but it’s a solid starting point if you’re looking to build your own Landing Zone on GCP. There’s still a lot you can add and customize depending on your needs, but hopefully this gives you a head start. If you're trying it out or have suggestions, feel free to chat in the comments. Thanks for stopping by!&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/architecture/landing-zones" rel="noopener noreferrer"&gt;Landing Zones on Google Cloud&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cloudcomputing</category>
      <category>googlecloud</category>
      <category>devops</category>
      <category>terraform</category>
    </item>
    <item>
      <title>ETL Process from NoSQL to SQL Database</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Fri, 03 Mar 2023 16:28:01 +0000</pubDate>
      <link>https://dev.to/aldifp01/etl-process-on-nosql-and-mysql-database-4pek</link>
      <guid>https://dev.to/aldifp01/etl-process-on-nosql-and-mysql-database-4pek</guid>
      <description>&lt;h3&gt;
  
  
  Project Description
&lt;/h3&gt;

&lt;p&gt;This is a my project for Data Engineering course during college to do some ETL processing on pokemon dataset which can be accessed on this &lt;a href="https://github.com/fanzeyi/pokemon.json/" rel="noopener noreferrer"&gt;link&lt;/a&gt;. On this project, I created some ETL processes to fix certain things in the dataset, like names, numbering, fixing missing value etc.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dataset Description
&lt;/h3&gt;

&lt;p&gt;This dataset is a dataset that discusses the data contained in the Pokemon video game. The data comes from Pokemon starting from Gen I (Pokemon Red, Green, Blue) to Gen 7 (Pokemon Sun &amp;amp; Moon, Lets Go). In this dataset there are 4 data or tables:&lt;/p&gt;

&lt;h4&gt;
  
  
  Pokedex
&lt;/h4&gt;

&lt;p&gt;The Pokedex table is a sort of encyclopedia of all the pokemon in the pokemon universe. Pokedex contains information about the statistics of pokemon in general.&lt;/p&gt;

&lt;h4&gt;
  
  
  Moves
&lt;/h4&gt;

&lt;p&gt;The moves table is a table that contains skills or attacks that can be trained on Pokemon. Usually known as TM and HM. Each move has its own statistics, and a maximum of 4 moves can exist in one Pokemon.&lt;/p&gt;

&lt;h4&gt;
  
  
  Types
&lt;/h4&gt;

&lt;p&gt;The types table is a table that lists the types of pokemon or movesets in the Pokemon video game. There are 18 different types of Pokemon and existing movesets.&lt;/p&gt;

&lt;h4&gt;
  
  
  Items
&lt;/h4&gt;

&lt;p&gt;The items table contains items that can be used directly by players or held by Pokemon during battle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table Relation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5z8osxdcxx52es3xy0t4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5z8osxdcxx52es3xy0t4.png" alt="rel_database" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The relationship between each table in the dataset can be seen in the image above. From the picture, it can be seen that the pokemon has or is registered in the pokedex. Then, Pokemon has a maximum of up to two types. In terms of moves, each Pokemon has a maximum of four. And lastly, every Pokemon can have at least one item.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extract
&lt;/h3&gt;

&lt;p&gt;The first step is to carry out the extraction process. In this process, I also created a logging function to store information about what processes have been running. The logging function can be seen in the following program code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!touch logfile.txt
def log(pesan):
    timestamp_format = '%Y-%h-%d-%H:%M:%S'
    now = datetime.now()
    timestamp = now.strftime(timestamp_format)
    with open("logfile.txt","a") as f:
        f.write(timestamp+','+pesan+'\n')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next step is to download the dataset from the GitHub link that has been attached above. The dataset from that page is a dataset of type NoSQL with key and value types. For that, the data is first converted to Pandas dataframe format. While converting to Pandas, we can carry out the logging process by calling the log function that was created earlier by calling log(). The program code for downloading, the conversion function to a Pandas dataframe, and the conversion can be seen in the following program code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Downloading dataset
!curl -LJO https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/items.json
!curl -LJO https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/moves.json
!curl -LJO https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/pokedex.json
!curl -LJO https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/types.json

#extract function
def extract_from_json(file_to_process):
    dataframe = pd.read_json(file_to_process,orient=str)
    return dataframe

#calling extract function
log("Proses ETL dimulai")
log("Proses Extract : Mengekstrak 4 tabel JSON menjadi dataframe")
df_1=extract_from_json('pokedex.json')
df_2=extract_from_json('moves.json')
df_3=extract_from_json('types.json')
df_4=extract_from_json('items.json')
log("Proses esktraksi tabel JSON selesai")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwv8twsfepbd1gtzwjkmb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwv8twsfepbd1gtzwjkmb.png" alt="extract" width="692" height="410"&gt;&lt;/a&gt;&lt;br&gt;
The picture above is the result of calling the df_1 dataframe to check the output. It can be seen that there are four features, namely id, name, type, and basic stats. If you pay attention, the name, type, and basic stats have parentheses, and each value is separated by a colon. This means that the feature stores data in the form of keys and values. Therefore, it is necessary to carry out a transformation process to retrieve only some data from this NoSQL feature.&lt;/p&gt;
&lt;h3&gt;
  
  
  Transform
&lt;/h3&gt;

&lt;p&gt;In the transformation stage, the transformation steps are carried out differently for each table, where the explanation is as follows:&lt;/p&gt;

&lt;p&gt;On the pokedex table, the transformation that is done is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing Japanese, Chinese, and French names&lt;/li&gt;
&lt;li&gt;Flattening the type attribute&lt;/li&gt;
&lt;li&gt;Flatten the basic stat attribute&lt;/li&gt;
&lt;li&gt;Deleting nested attributes that exist in the initial dataframe&lt;/li&gt;
&lt;li&gt;Merging previously fixed attributes into one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the moves table, the transformations performed are&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing Japanese, Chinese, and tm and max_pp names because they are not on the original reference page (references come from Bulbapedia)&lt;/li&gt;
&lt;li&gt;Changing the value in the category column, which is still in Chinese.&lt;/li&gt;
&lt;li&gt;Fill in the NaN value to 100.0, because NaN here in the game is of infinite value, which move will be successfully used without the slightest failure&lt;/li&gt;
&lt;li&gt;Renaming the column ename to name&lt;/li&gt;
&lt;li&gt;Tidy up the column according to the reference from bulbapedia.&lt;/li&gt;
&lt;li&gt;Fixing flying moves that were recorded as fighting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the type table, the transformation performed is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing Japanese and Chinese names from the type, then renamed the English column to name.&lt;/li&gt;
&lt;li&gt;Creating an id column starting from 1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the items table, the transformation that is done is&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flattening the name attribute and only taking English item names&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The function for carrying out the transformation process on data can be seen in the following program code snippet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def transform_pokedex(df):
    log("Memulai transformasi tabel pokedex")
    #Menghapus nama bahasa Jepang, China dan Perancis
    df_1_name = pd.DataFrame.from_dict(df)
    df_1_name = df_1_name['name'].apply(pd.Series)
    df_1_name = df_1_name.drop(['japanese','chinese','french'],axis=1)
    df_1_name = df_1_name.rename(columns={'english': 'Name'})

    #Melakukan flattening kepada atribut type
    df_1_type = pd.DataFrame.from_dict(df)
    df_1_type = df_1_type['type'].apply(pd.Series)
    df_1_type = df_1_type.rename(columns={0: 'Type 1', 1: 'Type 2'})
    df_1_type = df_1_type.fillna('')

    #Melakukan flattening kepada atribut base
    df_1_stats= pd.DataFrame.from_dict(df)
    df_1_stats = df_1_stats['base'].apply(pd.Series)

    #Menghapus nested atribut yang ada pada dataframe awal
    df = df.drop(['name', 'base', 'type'], axis = 1)

    #Menggabungkan atribut yang sudah diperbaiki sebelumnya menjadi 1
    df = df.join(df_1_name)
    df = df.join(df_1_type)
    df = df.join(df_1_stats)

    log("Transformasi tabel pokedex berhasil")
    return df

def transform_moves(df):
    log("Memulai transformasi tabel move")
    #Menghapus nama bahasa Jepang, China, tm dan max_pp karena tidak terdapat pada 
    #link referensi aslinya (bulbapedia)
    df_2_fix = pd.DataFrame.from_dict(df)
    df_2_fix = df_2_fix.drop(['id','jname','cname','tm','max_pp'],axis=1)

    #Mengganti value pada kolom 'category' yang masih berbahasa China
    df_2_fix = df_2_fix.replace('物理','Physical')
    df_2_fix = df_2_fix.replace('特殊','Special')
    df_2_fix = df_2_fix.replace('变化','Status')

    #Mengisi nilai NaN menjadi 100.0, karena NaN disini maksud di gamenya adalah infinity
    #dimana move akan berhasil tanpa kegagalan sedikitpun
    df_2_fix = df_2_fix.fillna(100.0)
    #Mengganti nama kolom ename menjadi name
    df_2_fix = df_2_fix.rename(columns={'ename': 'Move','type': 'Type','category':'Category'
                                       ,'pp': 'PP','power':'Power','accuracy':'Accuracy'})

    #Merapikan kolom sesuai dengan referensi dari bulbapedia
    df_2_fix = df_2_fix[['Move', 'Type', 'Category','PP','Power','Accuracy']]

    #Memperbaiki moves flying yang tercatat sebagai fighting

    df = df_2_fix
    df = df.reset_index()
    df = df.rename(columns={"index":"id move"})
    df['id move'] = df.index + 1
    #df.set_index('id', inplace=True)

    log("Transformasi tabel move berhasil")
    return df

def transform_moves_2(df):
    log("Memulai perbaikan terhadap semua moveset bertipe Flying yang sebelumnya bertipe Fighting")
    flying_move = 331,176,402,339,412,447,431,64,296,18,15,541,118,607,63,364,142,506,365,16

    #Memperbaiki moveset Flying yang awalnya bernilai Fighting
    for x in flying_move:
        df.loc[x] = df.loc[x].replace(to_replace="Fighting", value = "Flying")

    log("Transformasi untuk perbaikan moveset type Flying berhasil")
    return df

def transform_type(df):
    log("Memulai transformasi tabel type")
    #Menghapus nama Jepang dan nama China dari tipe, kemudian mengganti nama kolom 'english'
    #menjadi 'name'
    df_3_fix = pd.DataFrame.from_dict(df)
    df_3_fix = df_3_fix.drop(['chinese','japanese'],axis=1)
    df_3_fix = df_3_fix.rename(columns={'english': 'Name'})

    df = df_3_fix

    #Membuat kolom id yang dimulai dari 1
    df = df.reset_index()
    df = df.rename(columns={"index":"id"})
    df['id'] = df.index + 1
    #df.set_index('id', inplace=True)
    df = df.rename(columns={"id":"id type"})
    log("Transformasi tabel type berhasil")
    return df

def transform_items(df):
    log("Memulai transformasi tabel items")
    #Melakukan flattening ke atribut 'name' dan hanya mengambil nama item dalam 
    #bahasa inggris saja
    df_4_fix = pd.DataFrame.from_dict(df)
    df_4_name = pd.DataFrame.from_dict(df)
    df_4_name = df_4_name['name'].apply(pd.Series)
    df_4_name = df_4_name.drop(['japanese','chinese'],axis=1)

    df_4_fix = df.join(df_4_name)
    df = df_4_fix
    df = df.drop(['id','name'],axis=1)

    df = df.rename(columns={'english': 'Name'})
    df['id']= range(0, 0+len(df))

    df=df[['id','Name']]
    df = df.rename(columns={"id":"id item"})
    log("Transformasi tabel items berhasil")
    return df
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, the following function is to call the transform function on the dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log("Memulai transformasi tabel pokedex dengan melakukan flattening data")
pokedex_fix = transform_pokedex(df_1)
moves_fix = transform_moves(df_2)
moves_fix = transform_moves_2(moves_fix)
types_fix = transform_type(df_3)
items_fix = transform_items(df_4)
log("Transformasi semua tabel telah berhasil")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzh6lxmpecwyrqfxlzlm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzh6lxmpecwyrqfxlzlm.png" alt="transform" width="590" height="374"&gt;&lt;/a&gt;&lt;br&gt;
The picture above is an example of the results of the transformation in the pokedex table, where now there are features id, name, type 1, type 2, HP, attack, defense, Sp.Attack, Sp.Defense, and speed. For Pokemon with a single type, type 2 contains an empty string.&lt;/p&gt;

&lt;p&gt;Because in this project I plan to make each Pokemon have four moves and hold items, Then a transformation process is carried out again to give a random move to the pokemon, with the condition that the move must be of the same type as the pokemon. For this reason, the function to provide random moves can be seen in the following program code snippet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def addmove4(final_pkmn,awal,akhir):
    log("Memulai transaksi 4 move kepada semua pokemon dalam pokedex")
    count = 0
    for i in range(awal,akhir):
        moves_added = 0
        while(moves_added&amp;lt;4):
            count_same = 0
            rand=random.randint(0,611)
            random_name= moves_fix.iloc[rand]['Move']
            random_type = moves_fix.iloc[rand]['Type']
            t_random=random_type
            t1 = pokedex_fix["Type 1"][i]
            t2 = pokedex_fix["Type 2"][i]
            if (t1==t_random) or (t2 == t_random):
                if(moves_added&amp;lt;4):
                    id_pkmn = pokedex_fix.iloc[i]['id']
                    nama_pkmn = pokedex_fix.iloc[i]['Name']
                    moveset=random_name
                    tambah = {"id" : id_pkmn,"Move" : moveset}
                    count += 1
                    moves_added += 1
                    final_pkmn=final_pkmn.append(tambah,ignore_index=True) 
                    log("Menambahkan moveset "+moveset+"ke pokemon bernama "+nama_pkmn)
                else:
                    for x in range(moves_added):
                        if random_name==final_pkmn.iloc[count-x]['Move']:
                            count_same=0
                        else : 
                            id_pkmn = pokedex_fix.iloc[i]['id']
                            nama_pkmn = pokedex_fix.iloc[i]['Name']
                            moveset=random_name
                            tambah = {"id" : id_pkmn,"Move" : moveset}
                            count += 1
                            moves_added += 1
                            final_pkmn=final_pkmn.append(tambah,ignore_index=True)  
                            log("Menambahkan moveset "+moveset+"ke pokemon bernama "+nama_pkmn)
            else :
                 continue
    logggg ="Proses transaksi moveset dari pokemon dengan index "+str(awal)+" hingga "+str(akhir)+" berhasil"
    log(logggg)
    print(logggg)
    return final_pkmn

log("Mendeklarasikan tabel terakhir untuk menggabungkan atau JOIN beberapa tabel")
final_pkmn = pd.DataFrame()
final_pkmn["id"]=""
final_pkmn["Move"]=""

log("Menginisiasikan proses transaksi moveset kepada semua 809 pokemon")
final_pkmn = addmove4(final_pkmn,0,809)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pd7oo5uby6lkbjl7vee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pd7oo5uby6lkbjl7vee.png" alt="randmove" width="170" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dataframe results obtained after the transformation are as shown above. It can be seen that a pokemon with ID 1 has four moves, and so on. Thus, the final step is to merge all tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def join_final_transform(final_pkmn):
    log("Memulai proses JOIN pada beberapa tabel")
    log("Memulai proses LEFT JOIN pada tabel pokedex dengan item -&amp;gt; pokemon memiliki hold item")
    pokemon_w_item = pokedex_fix
    item_join = items_fix
    item_join = item_join.rename(columns={"id":"Item id","Name":"Item Name"})
    pokemon_w_item = pokemon_w_item.join(item_join)
    log("Proses LEFT JOIN berhasil dilakukan")

    log("Memulai proses FULL OUTER JOIN antara tabel pokemon yang sudah memiliki hol item dengan transaksi move")
    final_pkmn = pd.merge(pokemon_w_item,final_pkmn,on='id')
    final_pkmn = pd.merge(final_pkmn,moves_fix,on="Move")
    final_pkmn = final_pkmn.sort_values(by=['id'])
    final_pkmn = final_pkmn.reset_index()
    final_pkmn = final_pkmn.drop("index", axis=1)
    log("Proses FULL OUTER JOIN antara tabel pokemon dengan transaksi move berhasil dilakukan")

    return final_pkmn

log("Melakukan transformasi final yakni melakukan proses join")
final_pkmn = join_final_transform(final_pkmn)
log("Semua proses transformasi berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9y0y6o8ogs7jx33u1f0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9y0y6o8ogs7jx33u1f0n.png" alt="final" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final appearance after the final transformation of the dataset is as shown above. For example, the bulbasaur has four moves that have different stats and don't store items.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load
&lt;/h3&gt;

&lt;p&gt;During the load process, I load two different databases, namely SQLite and MySQL. The process of loading data into a SQLite database can be seen in the following program code snippet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log("Membuat connection ke database sqlite dengan API sqlite3 beserta cursornya")
connection = s3.connect('preprocessed_pokemon.db')
cursor = connection.cursor()
engine = create_engine('sqlite:///home/stndb01/Documents/Data Engineering/UTS/preprocessed_pokemon.db')  

log("Meload semua tabel ke dalam datamart sqlite3")
final_pkmn.to_sql(name='Pokemon_Final', con=connection, if_exists='replace', index=False)
pokedex_fix.to_sql(name='Pokedex', con=connection, if_exists='replace', index=False)
moves_fix.to_sql(name='Moves', con=connection, if_exists='replace', index=False)
types_fix.to_sql(name='Type', con=connection, if_exists='replace', index=False)
items_fix.to_sql(name='Items', con=connection, if_exists='replace', index=False)
log("Proses load tabel ke datamart sqlite3 berhasil")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Meanwhile, the MySQL database has more or less the same function. The only significant difference lies in the data storage; previously, SQLite stored it in a local database, which is a file that had a .db format. While MySQL saves on localhost, which can be accessed through phpmyadmin, The process of loading data into the MySQL database can be seen in the following program code snippet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log("Membuat connection ke database mysql dengan API mysql beserta cursornya")
engine_mysql = create_engine('mysql+pymysql://phpmyadmin:bruh@127.0.0.1/preprocessed_pokemon')
connection_mysql = engine_mysql.raw_connection()
my_cursor = connection_mysql.cursor()

log("Meload semua tabel ke dalam datamart mysql lokal")
final_pkmn.to_sql('Pokemon_Final', con = engine_mysql, if_exists='replace', index=False)  
pokedex_fix.to_sql('Pokedex', con = engine_mysql, if_exists='replace', index=False)  
moves_fix.to_sql('Moves', con = engine_mysql, if_exists='replace', index=False)  
types_fix.to_sql('Type', con = engine_mysql, if_exists='replace', index=False)  
items_fix.to_sql('Items', con = engine_mysql, if_exists='replace', index=False)  
log("Proses load tabel ke datamart mysql lokal berhasil")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SQLite and MySQL have similar query functions, so we can call the same query line for the same process. The difference is the function of Python to call queries from each database. SQLite queries can be executed with read_sql from pandas. But in MySQL, queries are run using engine_mysql and the execute function. The following are examples of queries and calls for each database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log("Menampilkan isi dari tabel dari datamart sqlite3")
sql_read_1 = ''' SELECT * FROM Pokemon_Final'''
sql_read_2 = ''' SELECT * FROM Pokedex'''
sql_read_3 = ''' SELECT * FROM Moves'''
sql_read_4 = ''' SELECT * FROM Type'''
sql_read_5 = ''' SELECT * FROM Items'''

def read_table_sqlite3(query):
    return pd.read_sql(query,connection)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  SQLite
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flulwud73rsviwqq2sg0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flulwud73rsviwqq2sg0a.png" alt="SQLite" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  MySQL
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvlyuc83aqymrl6h2w2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvlyuc83aqymrl6h2w2z.png" alt="MySQL" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CRUD
&lt;/h3&gt;

&lt;p&gt;The data that has been stored in the SQLite and MySQL databases can be changed using the Python environment. Following are examples of insert, update, and delete operations on SQLite and MySQL databases via Python.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQLite
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Insert
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Insert secara banyak ke tabel pokedex
log("Melakukan proses insert pada tiga tabel di sqlite3 dengan masing-masing 3 data")
new_pokemon = [(810,"Kadal Bersayap","Flying","Dragon",78,89,42,54,58,85),
              (811,"Naga Indosiar","Flying","Dragon",68,78,67,25,52,35),
              (810,"Kuda Bertanduk","Normal","Fairy",48,42,25,34,78,44)]
cursor.executemany('INSERT INTO Pokedex VALUES (?,?,?,?,?,?,?,?,?,?)', new_pokemon)

#Insert satu-satu
cursor.execute('INSERT INTO Moves VALUES (613,"Bruh","Normal","Special",5.0,100.0,90.0)')
cursor.execute('INSERT INTO Moves VALUES (614,"Fus Ro Dah","Dragon","Special",8.0,90.0,100.0)')
cursor.execute('INSERT INTO Moves VALUES (615,"Totsugeki","Flying","Physical",40.0, 100.0, 100.0)')

new_pokemon_w_move = [(810,"Kadal Bersayap","Flying","Dragon",78,89,42,54,58,85,1,"Master Ball","Totsugeki",615,"Flying","Physical",40.0, 100.0, 100.0),
                     (811,"Naga Indosiar","Flying","Dragon",68,78,67,25,52,35,1,"Master Ball","Fus Ro Dah",614,"Dragon","Special",8.0,90.0,100.0),
                     (810,"Kuda Bertanduk","Normal","Fairy",48,42,25,34,78,44,1,"Master Ball","Bruh",613,"Normal","Special",5.0,100.0,90.0)]
cursor.executemany('INSERT INTO Pokemon_Final VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', new_pokemon_w_move)
connection.commit()
log("Proses insert pada sqlite3 berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Update
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Update data yang sudah ada
log("Melakukan update data pada yang sudah ada pada db sqlite3")
update_pokemon_1 = ("Mega-Bulbasaur","Bulbasaur") 
update_1 = '''UPDATE Pokemon_Final set Name = ? WHERE Name LIKE ?'''
cursor.execute(update_1,update_pokemon_1)

update_pokemon_2 = ("Poison","Melmetal")
update_2 =  '''UPDATE Pokemon_Final set "Type 2" = ? WHERE Name LIKE ?'''
cursor.execute(update_2,update_pokemon_2)

update_pokemon_3 = ("Mega-Charizard","Dragon","Charizard")
update_3 =  '''UPDATE Pokemon_Final set Name = ?, "Type 1" = ?  WHERE Name LIKE ?'''
cursor.execute(update_3,update_pokemon_3)
connection.commit()
log("Update data pada db sqlite3 berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Delete
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Delete data yang sudah ada
log("Melakukan delete pada data yang sudah ada pada db sqlite3")
cursor.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Mega-Bulbasaur"')
cursor.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Ivysaur"')
cursor.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Venusaur"')
cursor.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Melmetal"')
cursor.execute('DELETE FROM Pokemon_Final WHERE Name = "Kadal Bersayap"')
connection.commit()
log("Delete data pada db sqlite3 berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  MySQL
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Insert
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Insert data secara banyak ke mysql
log("Melakukan insert data pada db mysql")
insert_my_1 = "INSERT INTO Pokedex VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"
nilai_ins_1 = [(810,"Kadal Bersayap","Flying","Dragon",78,89,42,54,58,85),
              (811,"Naga Indosiar","Flying","Dragon",68,78,67,25,52,35),
              (810,"Kuda Bertanduk","Normal","Fairy",48,42,25,34,78,44)]
my_cursor.executemany(insert_my_1,nilai_ins_1)

#Insert data satu persatu
engine_mysql.execute('INSERT INTO Moves VALUES (613,"Bruh","Normal","Special",5.0,100.0,90.0)')
engine_mysql.execute('INSERT INTO Moves VALUES (614,"Fus Ro Dah","Dragon","Special",8.0,90.0,100.0)')
engine_mysql.execute('INSERT INTO Moves VALUES (615,"Totsugeki","Flying","Physical",40.0, 100.0, 100.0)')


insert_my_3 = "INSERT INTO Pokemon_Final VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"
nilai_ins_3 = [(810,"Kadal Bersayap","Flying","Dragon",78,89,42,54,58,85,1,"Master Ball","Totsugeki",615,"Flying","Physical",40.0, 100.0, 100.0),
                     (811,"Naga Indosiar","Flying","Dragon",68,78,67,25,52,35,1,"Master Ball","Fus Ro Dah",614,"Dragon","Special",8.0,90.0,100.0),
                     (810,"Kuda Bertanduk","Normal","Fairy",48,42,25,34,78,44,1,"Master Ball","Bruh",613,"Normal","Special",5.0,100.0,90.0)]
my_cursor.executemany(insert_my_3,nilai_ins_3)

connection_mysql.commit()
log("Insert data pada db mysql berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Update
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Update data yang sudah ada
log("Melakukan update data pada yang sudah ada pada db mysql")
update_my_1 = "UPDATE Pokemon_Final set Name = %s WHERE Name LIKE %s"
nilai_upd_1 = ("Mega-Bulbasaur","Bulbasaur") 
my_cursor.execute(update_my_1,nilai_upd_1)

update_my_2 =  "UPDATE Pokemon_Final set `Type 2` = %s WHERE Name LIKE %s"
nilai_upd_2 = ("Poison","Melmetal")
my_cursor.execute(update_my_2,nilai_upd_2)

update_my_3 =  "UPDATE Pokemon_Final set Name = %s, `Type 1` = %s  WHERE Name LIKE %s"
nilai_upd_3 = ("Mega-Charizard","Dragon","Charizard")
my_cursor.execute(update_my_3,nilai_upd_3)
connection_mysql.commit()
log("Update data pada db mysql berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Delete
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Delete data yang sudah ada
log("Melakukan delete pada data yang sudah ada pada db mysql")
engine_mysql.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Mega-Bulbasaur"')
engine_mysql.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Ivysaur"')
engine_mysql.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Venusaur"')
my_cursor.execute('DELETE FROM Pokemon_Final WHERE Name LIKE "Melmetal"')
my_cursor.execute('DELETE FROM Pokemon_Final WHERE Name = "Kadal Bersayap"')
connection_mysql.commit()
log("Delete data pada db mysql berhasil dilakukan")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>datascience</category>
    </item>
    <item>
      <title>Classification for Myocardial Infarction Dataset</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Wed, 01 Mar 2023 15:12:27 +0000</pubDate>
      <link>https://dev.to/aldifp01/classification-for-myocardial-infarction-dataset-k80</link>
      <guid>https://dev.to/aldifp01/classification-for-myocardial-infarction-dataset-k80</guid>
      <description>&lt;h3&gt;
  
  
  What is Myocardial Infarction
&lt;/h3&gt;

&lt;p&gt;Myocardial infarction, commonly known as the heart attack," is a very serious heart problem. This disorder occurs when the heart muscle does not get good blood flow. This condition will interfere with cardiac function and the flow of blood flow throughout the body, which can be fatal to humans.&lt;/p&gt;

&lt;p&gt;This complication can be detected by a tool called an electrocardiogram, or EKG. This tool detects the heartbeat wave signal, of which each signal is classified into three kinds: Q, R, and S. There are 11 kinds of heartbeat waves, from this wave we can classify a person's heart disease.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Description
&lt;/h3&gt;

&lt;p&gt;This is a project to classify a person heart disease from Myocardial Infarction Complication which can be accessed from this &lt;a href="https://s3-eu-west-1.amazonaws.com/pstorage-leicester-213265548798/23581310/MyocardialinfarctioncomplicationsDatabase.csv" rel="noopener noreferrer"&gt;link&lt;/a&gt;. On this project, the models used are Naive Bayes, Decision Trees, and Support Vector Machines. But there are some problems with the dataset. The dataset contains missing values, and most of them are not normalized. Therefore, imputation needs to be done, and then the normalization process is also carried out using min-max normalization. This project itself has also been published in an article that can be accessed on this &lt;a href="https://www.researchgate.net/publication/357507286_Algoritma_Decision_Tree_Dan_Smote_Untuk_Klasifikasi_Serangan_Jantung_Miokarditis_Yang_Imbalance" rel="noopener noreferrer"&gt;page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Dataset has categorical and non-categorical attributes. This non-categorical attribute will be normalized. These attributes include&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S_AD_KBRIG: Systolic blood pressure (mmHg)&lt;/li&gt;
&lt;li&gt;D_AD_KBRIG: diastolic blood pressure (mmHg)&lt;/li&gt;
&lt;li&gt;S_AD_ORIT: Systolic blood pressure from ICU (mmHg)&lt;/li&gt;
&lt;li&gt;D_AD_ORIT: Diastolic Blood Pressure from ICU (mmHg)&lt;/li&gt;
&lt;li&gt;K_BLOOD: the amount of serum potassium in the patient's blood (mmol/L)&lt;/li&gt;
&lt;li&gt;NA_BLOOD: total serum sodium in the patient's blood (mmol/L)&lt;/li&gt;
&lt;li&gt;ALT_BLOOD: amount of AIAT serum in the patient's blood (IU/L)&lt;/li&gt;
&lt;li&gt;AST_BLOOD: amount of serum AsAT in the patient's blood (IU/L)&lt;/li&gt;
&lt;li&gt;KFK_BLOOD: amount of serum CPK in the patient's blood (IU/L)&lt;/li&gt;
&lt;li&gt;L_BLOOD: the number of white blood cells present in the patient&lt;/li&gt;
&lt;li&gt;ROE: the amount of ESR or high blood sedimentation rate experienced by the patient&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last attribute, namely "LET_IS," is a lethal outcome that displays the classification of heart disease suffered by the patient based on the symptoms and complications experienced by the patient. There are 8 classes of disease classification results, which are represented by numbers, namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0 indicates the patient is healthy.&lt;/li&gt;
&lt;li&gt;1 indicates the patient has cardiogenic shock.&lt;/li&gt;
&lt;li&gt;2 indicates the patient has pulmonary edema.&lt;/li&gt;
&lt;li&gt;3 indicates the patient is classified as having a myocardial rupture.&lt;/li&gt;
&lt;li&gt;4 indicates the patient has congestive heart failure.&lt;/li&gt;
&lt;li&gt;5 indicates the patient has thromboembolism.&lt;/li&gt;
&lt;li&gt;6 indicates the patient has asystole.&lt;/li&gt;
&lt;li&gt;7 indicates that the patient has ventricular fibrillation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Preprocessing
&lt;/h3&gt;

&lt;p&gt;The first step is to do the initial data processing, which involves some initial steps such as checking the data to see if there is a missing value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
import numpy as np
!wget https://s3-eu-west-1.amazonaws.com/pstorage-leicester-213265548798/23581310/MyocardialinfarctioncomplicationsDatabase.csv
data = pd.read_csv('MyocardialinfarctioncomplicationsDatabase.csv')
data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwr7ktxml84jpib4nhfyh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwr7ktxml84jpib4nhfyh.png" alt="Missing Value" width="800" height="281"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def cetak_rentang(df_input):
  list_fitur=df_input.columns[:-1]
  for fitur in list_fitur:
    max=df_input[fitur].max()
    min=df_input[fitur].min()
    print("Rentang fitur", fitur, "Adalah", max-min)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36nnwnrdg8q95qjv3z13.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36nnwnrdg8q95qjv3z13.png" alt="Feature Range" width="698" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As seen in the two outputs above, if there is a missing value marked with NaN, then the feature range itself is not the same for all features. Thus, it is necessary to impute and normalize the data.&lt;/p&gt;

&lt;p&gt;To perform imputation of data, use the following equation and then call :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def imputasi(df_input):
  list_columns = df_input.columns
  class_column = list_columns[-1]  
  for column in list_columns[:-1] :
    df_input[column] = df_input[column].fillna(round(df_input.groupby(class_column)[column].transform('mean'),0))
    df_input[column] = df_input[column].fillna(df_input[column].mean())
  return df_input
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ut1qkdvpclcecqb6bqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ut1qkdvpclcecqb6bqh.png" alt="No NaN" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As seen in the picture above, there are no more missing values. So the next step is to normalize the min-max data so that the data has a range between 0 and 1. However, normalized features are only continuous or non-categorical features; discrete features such as age do not need to be normalized. For that, normalization can be done by using the following function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kolomfitur = ['S_AD_KBRIG','D_AD_KBRIG','S_AD_ORIT','D_AD_ORIT','K_BLOOD','NA_BLOOD',
              'ALT_BLOOD','AST_BLOOD','KFK_BLOOD','L_BLOOD','ROE']
from sklearn.preprocessing import MinMaxScaler
normalisasi = MinMaxScaler()
data[kolomfitur] = normalisasi.fit_transform(data[kolomfitur])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhry3pvuzv3qdsth06km3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhry3pvuzv3qdsth06km3.png" alt="Normalized Feature" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because the data no longer contains missing values and non-categorical features have been normalized. So the next step is to carry out the classification process on the dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classification
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Model Building
&lt;/h4&gt;

&lt;p&gt;The algorithm model in this project was made using sklearn. The code below serves to build an algorithm model for Naive Bayes, Decision Tree, and SVM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import classification_report

models = []
models.append(('NB', GaussianNB()))
models.append(('DT', DecisionTreeClassifier()))
models.append(('SVM', SVC(gamma='auto')))

results = []
names = []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Training Data and Validation data
&lt;/h4&gt;

&lt;p&gt;The next stage is to determine the distribution of training and test data. The division used for training data and test data is 80:20.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;array = data.values
X = array[:,0:123]
y = array[:,123]

X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1, shuffle=True)

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
import warnings
warnings.filterwarnings("ignore")

for name, model in models:
    kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
    cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
    results.append(cv_results)
    names.append(name)
    print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Model Prediction
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Naive Bayes
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = GaussianNB()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
print(classification_report(Y_validation, predictions))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cak9o73zw0fceobjd6x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cak9o73zw0fceobjd6x.png" alt="Naive Bayes" width="421" height="248"&gt;&lt;/a&gt;&lt;br&gt;
From the picture above, it can be seen that Naive Bayes has an accuracy of 0.48.This accuracy can be considered low compared to other methods. And from these results on 4, 6, and 7 get the value of precision, recall, and very low f1-score of 0.0.&lt;/p&gt;

&lt;h5&gt;
  
  
  Decision Tree
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = DecisionTreeClassifier()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
print(classification_report(Y_validation, predictions))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83m682k4haxiobcrim73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83m682k4haxiobcrim73.png" alt="DT" width="421" height="248"&gt;&lt;/a&gt;&lt;br&gt;
From the picture above, it can be seen that the decision tree has an accuracy of about 0.99. The accuracy level can be considered high compared to other methods. And from those results, 7 still&lt;br&gt;
get the value of precision, recall, and very low f1-score of 0.0.&lt;/p&gt;

&lt;h5&gt;
  
  
  SVM
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X_train, Y_train)
model = SVC(gamma='auto')
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
X, y = make_classification(n_samples=340, n_features=20, n_informative=15, n_redundant=5, random_state=1)
print(classification_report(Y_validation, predictions))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju20ghgyyg4b2hxwm04s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju20ghgyyg4b2hxwm04s.png" alt="SVM" width="421" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the picture above, it can be seen that SVM gets an accuracy of 0.92. This accuracy can be said to be high, but under the Decision Tree method. Thus, this method has accuracy in the middle of the other methods. And from these results, 2, 5, and 6 get very low values of precision, recall, and the f1-score, namely 0.0.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;MIC, or Myocardial Infarction Complication, is a dataset that contains patients who experience heart disease and possible complications. The need for pre-processing on this dataset is very crucial, such as for missing values, feature values that have not been normalized, and other deficiencies that need to be corrected so that the prediction results on the data can be obtained optimally. then finally, after modeling using three different algorithms, namely Naive Bayes, Decision Trees, and SVM, It is known that Decision Tree is the best algorithm to use in this MIC dataset because it can get higher accuracy compared to other algorithms.&lt;/p&gt;

</description>
      <category>datascience</category>
    </item>
    <item>
      <title>Insight Findings from A Case Study of Customers Leaving Credit Card Services</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Mon, 27 Feb 2023 18:23:26 +0000</pubDate>
      <link>https://dev.to/aldifp01/insight-findings-from-a-case-study-of-customers-leaving-credit-card-services-4if3</link>
      <guid>https://dev.to/aldifp01/insight-findings-from-a-case-study-of-customers-leaving-credit-card-services-4if3</guid>
      <description>&lt;p&gt;In this post, I will share my Rakamin Academy Virtual Internship final project regarding a case study related to the problem of a bank manager.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Description
&lt;/h3&gt;

&lt;p&gt;In this project, we are expected to solve the manager's problem by finding insights from case study where customer leaving credit card service from several datasets through data engineering and processing. This project mainly used SQL for querying the data and Tableau for data visualization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Case Study:
&lt;/h3&gt;

&lt;p&gt;A manager at a bank is annoyed by the increasing number of customers leaving their credit card service. They would really appreciate it if someone could figure out a customer profile so they could know where each customer is going so they could proactively go to the customer to provide better service and turn the customer's decision in the opposite direction. &lt;/p&gt;

&lt;p&gt;From the case study above, form three types of data processing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business Objectives&lt;/li&gt;
&lt;li&gt;Data Exploration&lt;/li&gt;
&lt;li&gt;Insights Presentation&lt;/li&gt;
&lt;li&gt;Conclusion and Solution&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Business Objectives
&lt;/h3&gt;

&lt;p&gt;From my perspective, here are some business objectives that need to be carried out to solve the problems in the case study.&lt;/p&gt;

&lt;h4&gt;
  
  
  Identify the profiles of customers or credit card customers.
&lt;/h4&gt;

&lt;p&gt;The first business objective created is to identify the profiles of our customers or credit card customers. The profile of credit card users is very important to know. This is due to the exploration and search for insights on customer profiles so we can find out about the conditions of consumers and know the background of consumers who are starting to leave credit card services. For this purpose, finding out the profile of customers is done by exploring data and also visualizing data related to customer history.&lt;/p&gt;

&lt;p&gt;Data exploration and visualization carried out are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer based on income and average credit limit, the first exploration is related to income and average credit limit. As we know, the higher a person's income, of course, the higher the credit limit that will be given, so this needs to be explored.&lt;/li&gt;
&lt;li&gt;Customers based on marital status and customer dependents, the second exploration is related to marriage and the customer's own responsibilities. This is because someone who is at least single and does not have many dependents has the possibility of not using a credit card, especially if they have a fairly high income. Therefore, it is worth exploring.&lt;/li&gt;
&lt;li&gt;Customer based on education and income category, the third exploration is the exploration of education level and income category. Education does not directly affect our credit card limit, but education can affect one's income. So this is worth exploring.&lt;/li&gt;
&lt;li&gt;Customer based on credit card and income, the final exploration related to the customer profile is the type of credit card the customer has and what is his level of income. From the existing dataset, there are blue, gold, silver, and platinum credit cards. Exploration was carried out to find out the relationship between the customer's credit card and his income.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Identify the amount of shopping or transactions made by the customer.
&lt;/h4&gt;

&lt;p&gt;The second business objective is to identify purchases or transactions made by customers. This is done to see the transaction activity of customers based on the type of credit card and their income and will be measured through the ratio of credit card usage. Thus, the steps that will be carried out to achieve this business objective are to explore by utilizing the features or information needed to identify customer transactions. These features are credit limits, average shopping with a credit card, number of transactions, transaction frequency, and credit card usage ratio. Then, exploration is carried out using the type of credit card and income category as benchmarks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Identify the relationship between the customer and the bank.
&lt;/h4&gt;

&lt;p&gt;The last business objective is to identify customers' relationships with the bank. Like the problems in the problem, namely the number of customers who have started leaving credit card services. Therefore, it is necessary to identify the relationship between the customer and the bank, which is divided into two parts as before, namely based on the type of credit card and income. In this business objective, the features used are the period of contact with the bank, the number of months of inactivity, the total contact with the bank, and the ratio of credit card usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Exploration
&lt;/h3&gt;

&lt;p&gt;The following is data exploration using SQL to find some insights that will be visualized later.&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their income and credit limit
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT Income_Category, COUNT(CLIENTNUM) as Total_Customer, 
ROUND(AVG(Credit_Limit),2) as Avg_Credit_Limit
FROM customer_data_history
GROUP BY Income_Category
ORDER BY FIELD(Income_Category,"Unknown","Less than $40K","$40K - $60K","$60K - $80K","$80K - $120K","$120K +");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo81xhg1ou8qpbkdz2uvk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo81xhg1ou8qpbkdz2uvk.png" alt="Customer based on their income and credit limit" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their marital status and dependent
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT m_db.marital_status, COUNT(*) as Status_Count,
ROUND(AVG(cdh.Dependent_count)) as AVG_Dependent,
cdh.Income_Category FROM customer_data_history cdh
LEFT JOIN marital_db m_db
ON cdh.Maritalid = m_db.id
GROUP BY m_db.marital_status, cdh.Income_Category
ORDER BY FIELD(Income_Category,"Unknown","Less than $40K","$40K - $60K","$60K - $80K","$80K - $120K","$120K +"), 
marital_status ASC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy438ecij2gatpw3gbp5q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy438ecij2gatpw3gbp5q.png" alt="Customer based on their marital status and dependent" width="800" height="632"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their education status and income category
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT e_db.education_level, COUNT(*) as Education_Count,
cdh.Income_Category FROM customer_data_history cdh
LEFT JOIN education_db e_db
ON cdh.Educationid = e_db.id
GROUP BY e_db.education_level, cdh.Income_Category
ORDER BY FIELD(Income_Category,"Unknown","Less than $40K","$40K - $60K","$60K - $80K","$80K - $120K","$120K +"), 
education_level ASC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2y6ztuwi9r3vwmwch09z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2y6ztuwi9r3vwmwch09z.png" alt="Customer based on their education status and income category" width="800" height="773"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their credit card category and income category
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT c_db.card_category, COUNT(*) as card_owner_total, cdh.Income_Category
FROM customer_data_history cdh
LEFT JOIN category_db c_db
ON cdh.card_categoryid = c_db.id
GROUP BY c_db.card_category,cdh.Income_Category
ORDER BY FIELD(c_db.card_category,"Blue","Gold","Silver","Platinum"),
FIELD(Income_Category,"Unknown","Less than $40K","$40K - $60K","$60K - $80K","$80K - $120K","$120K +");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakdxbvmh03avixghjvv9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakdxbvmh03avixghjvv9.png" alt="Customer based on their credit card category and income category" width="800" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their credit card category and income category
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT Income_Category, COUNT(*) as Total_Consumer,
ROUND(AVG(Credit_Limit),2) as AVG_Credit_Limit, 
ROUND(AVG(Avg_Open_To_Buy),2) as AVG_Buying, 
ROUND(AVG(Total_Trans_Amt),2) as AVG_Transactions, 
ROUND(AVG(Total_Trans_Ct),2) as AVG_Transac_Freq, 
ROUND(AVG(Avg_Utilization_Ratio),2) as AVG_Credit_Util_Ratio 
FROM customer_data_history
GROUP BY Income_Category
ORDER BY FIELD(Income_Category,"Unknown","Less than $40K","$40K - $60K","$60K - $80K","$80K - $120K","$120K +");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4knjznmi01phjdhhjc50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4knjznmi01phjdhhjc50.png" alt="Customer based on their credit card category and income category" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their credit card category and income category
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT c_db.card_category, COUNT(*) as Card_owner,
ROUND(AVG(cdh.Credit_Limit),2) as AVG_Credit_Limit, 
ROUND(AVG(cdh.Avg_Open_To_Buy),2) as AVG_Buying, 
ROUND(AVG(cdh.Total_Trans_Amt),2) as AVG_Transactions, 
ROUND(AVG(cdh.Total_Trans_Ct),2) as AVG_Transac_Freq, 
ROUND(AVG(cdh.Avg_Utilization_Ratio),2) as AVG_Credit_Util_Ratio 
FROM customer_data_history cdh
LEFT JOIN category_db c_db
ON cdh.card_categoryid = c_db.id
GROUP BY c_db.card_category
ORDER BY FIELD(c_db.card_category,"Blue","Gold","Silver","Platinum");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1khlx9mhgof5nh280nw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1khlx9mhgof5nh280nw5.png" alt="Customer based on their credit card category and income category" width="800" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Customer based on their contact and relation with bank
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT cdh.Income_Category, c_db.card_category, COUNT(*) as card_owner_total,  
AVG(cdh.Months_on_book) as AVG_Months_on_book,
ROUND(AVG(cdh.Total_Relationship_Count)) as AVG_Total_Rel_Count,
ROUND(AVG(cdh.Months_Inactive_12_mon)) as AVG_Months_Inactive,
ROUND(AVG(cdh.Contacts_Count_12_mon)) as AVG_Contact_Count,
ROUND(AVG(cdh.Avg_Utilization_Ratio),2) as AVG_Credit_Util_Ratio 
FROM customer_data_history cdh
LEFT JOIN category_db c_db
ON cdh.card_categoryid = c_db.id
GROUP BY c_db.card_category,cdh.Income_Category
ORDER BY FIELD(c_db.card_category,"Blue","Gold","Silver","Platinum"),
FIELD(Income_Category,"Unknown","Less than $40K","$40K - $60K","$60K - $80K","$80K - $120K","$120K +");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbywrsqwegfeem04qk5fr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbywrsqwegfeem04qk5fr.png" alt="Customer based on their contact and relation with bank" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Insight Presentation
&lt;/h3&gt;

&lt;p&gt;The following is an explanation of the insight presentation that was encountered during the previous data exploration. I also made an interactive dashboard, which was made using Tableau and can be accessed at the following page: &lt;a href="https://public.tableau.com/app/profile/aldi.fianda.putra/viz/AldiFiandaPutra_TASK_5_DATA_ENGINEER_VIX_BTPNS/Dashboard1?publish=yes" rel="noopener noreferrer"&gt;Tableau Dashboard&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrvtqavhfp9m72nxg2ki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrvtqavhfp9m72nxg2ki.png" alt="Income and Credit Limit Visualization" width="800" height="403"&gt;&lt;/a&gt;&lt;br&gt;
From the graph above, it is known that the majority of customers have an income of under $ 40K. While the average credit limit is the most for those whose income is above $ 120K +. Based on this visualization, the majority of customers have a fairly low income, so the credit limit that can be given is also small for customers who have an income below $ 40K, namely those with a credit limit below $ 5K.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapstjtwe6elu7witp3sc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapstjtwe6elu7witp3sc.png" alt="Education Background Visualization" width="800" height="512"&gt;&lt;/a&gt;&lt;br&gt;
From the graph above, it is known that the majority of customers are undergraduates, followed by high school graduates, unknown, uneducated, still in college, postgraduates, and doctorates. Based on this visualization, it is also known that customers who are graduates are also the largest group of customers who have an income of less than $40K. In addition, it can be seen that education does not guarantee how much income a person can earn; even doctors themselves have an income of under $40K.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fin56epck8bjlcfhajyvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fin56epck8bjlcfhajyvu.png" alt="Marital Status and Dependent Visualization" width="800" height="435"&gt;&lt;/a&gt;&lt;br&gt;
Marital status, average dependents, and customer income are visualized in the treemap on the side. Based on this visualization, it turns out that customers who are married, have at least 2 dependents, and earn under $40K are the largest group of customers. Followed by customers who are single and have an income under $ 40K and 2 dependents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8t97s9s1co1lcqtk7yw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8t97s9s1co1lcqtk7yw.png" alt="Credit Card Category Visualization" width="800" height="466"&gt;&lt;/a&gt;&lt;br&gt;
Customers are divided by credit card and income through the visualization on the side. Based on this visualization, the Blue type of credit card is the most popular, where all income groups of customers use this type of credit card even including those with income above $ 120K. Then followed the types of silver and gold. The Platinum card is the one with the fewest fans, so few that it doesn't even show up on the visualization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtdq5dqpkxk3bl21qrj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtdq5dqpkxk3bl21qrj5.png" alt="Total Customer and Avg Transaction Visualization" width="800" height="347"&gt;&lt;/a&gt;&lt;br&gt;
The average number of transactions made by customers based on their type of income can be seen in the image above. Based on this visualization, the average transactions made by customers are more or less close for all income categories, with a significant difference only found in the number of customers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdk0593l9m8q8pdnykcm2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdk0593l9m8q8pdnykcm2.png" alt="Total Customer and Avg Transaction Visualization" width="800" height="497"&gt;&lt;/a&gt;&lt;br&gt;
The average transaction made by a customer based on the type of credit card can be seen in the image above. Based on this visualization, the average transaction is made by customers who have blue credit cards, but the average number of transactions made is fewer than those made by customers who have silver, gold, and platinum credit cards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4o944v28zl4yiwfxzfos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4o944v28zl4yiwfxzfos.png" alt="Avg Spending and Credit Limit Visualization" width="800" height="349"&gt;&lt;/a&gt;&lt;br&gt;
The average customer spending can be seen from the graph above. Based on this visualization, customers with the highest income certainly have a fairly high average spending compared to customers with other income categories. From this visualization, it can also be seen that all customer categories tend to spend within the credit limit provided.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fml5qv1igbjfqltt6ur9f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fml5qv1igbjfqltt6ur9f.png" alt="Avg Spending and Credit Limit by Credit Card Category Visualization" width="800" height="497"&gt;&lt;/a&gt;&lt;br&gt;
The average customer spending by credit card can be seen from the graph above. Like the previous visualization, based on this one it can be seen that customers with a platinum type credit card have the largest average spending compared to other types of credit cards, even higher than the blue type, where previously the majority of customers with an income of $120K+ used this type of card. In addition, customers also tend not to spend beyond their credit limit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz43wk95mo0qvblpnlq5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz43wk95mo0qvblpnlq5s.png" alt="Customer Relation with Bank based on credit and product held by Customer" width="800" height="629"&gt;&lt;/a&gt;&lt;br&gt;
The ratio of credit card use to total products held can be seen from the graph above. Based on this visualization, it can be seen that customers with blue credit cards are far more likely to use their credit cards, the majority of which are used by customers with incomes below $40K. Then, in terms of products held, it is also dominated by customers with incomes below $40K, followed by customers with incomes above $120K.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fynaogq88llzz2r1t9wku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fynaogq88llzz2r1t9wku.png" alt="Customer Relation with Bank based on contact" width="800" height="609"&gt;&lt;/a&gt;&lt;br&gt;
The average contact between the bank and the customer can be seen in the graph above. Based on this visualization, it can be seen that customers with platinum-type credit cards are the most frequently contacted by the bank. Then in terms of the period of contact with the bank, customers with an income of $ 80K-120K are the most frequent. Finally, the customers with the most inactivity are those with incomes below $40K.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion and Solution
&lt;/h3&gt;

&lt;p&gt;From the business objectives, data exploration, and insight presentations that were carried out, several things and solutions can be concluded to overcome the problems in the case studies.&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The majority of customers have an income below $40K, where they are also the majority are married or single and have at least 2 dependents. The credit limit for this category of customers is also the lowest compared to other categories.&lt;/li&gt;
&lt;li&gt;The majority of customers still use blue credit cards; even customers with high incomes, such as those with incomes above $120K+, use this type of credit card instead of using higher types such as silver, gold, and platinum.&lt;/li&gt;
&lt;li&gt;The average transactions made are more or less close for all income categories.  However, when using the credit card approach, platinum credit cards have the highest average transactions, followed by gold and silver.&lt;/li&gt;
&lt;li&gt;In terms of credit card usage ratio, customers with blue credit cards are the ones who use credit cards the most, which are dominated by customers with incomes below $40K. Unfortunately, customers in this income category are also the most product owners, the least contacted by the bank, and also the most inactive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Solution
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;By increasing the credit limit for customers with lower middle income, it is expected to be able to attract the desire of these customers to use credit cards again.&lt;/li&gt;
&lt;li&gt;Because many customers still use Blue type credit cards, especially since they also have middle to upper income, we should provide more benefits to credit cards at higher levels so that these customers want to switch to using Gold, Silver and Platinum credit cards.&lt;/li&gt;
&lt;li&gt;Because blue credit cards are the most common but have the fewest average transactions, we can provide rewards or promos to customers with blue credit cards in order to increase the customer's desire to make transactions with credit cards.&lt;/li&gt;
&lt;li&gt;Because customers with incomes below $40K are the largest group and also the largest product owners, we as banks should make frequent contact with these customers, such as by conveying promos, benefits, and other things related to our credit cards.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>watercooler</category>
    </item>
    <item>
      <title>Deploying Multicontainer Application on Openshift</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Sun, 18 Sep 2022 15:15:09 +0000</pubDate>
      <link>https://dev.to/aldifp01/deploying-multicontainer-application-on-openshift-1c0e</link>
      <guid>https://dev.to/aldifp01/deploying-multicontainer-application-on-openshift-1c0e</guid>
      <description>&lt;p&gt;In this post I will share my Kominfo's Fresh Graduate Academy project about Deploying Multicontainer Applications on Openshift&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Openshift?
&lt;/h3&gt;

&lt;p&gt;Basicly, Openshift is an orchestration tool owned by Red Hat, which provides a service for cloud development platforms. According to Red Hat Company, Kubernetes is the kernel of a distributed system and Openshift is its distribution. Openshift will make it easier for developers to add the capabilities of PaaS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Description
&lt;/h3&gt;

&lt;p&gt;This is a simple project to deploy PHP applications with MySQL as it database on Openshift. The source code was actually my assignment during my 4th semester at college. The deployment method that I will use is source-to-image, or popularly known as S2I.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yz0z3o5bai29y89s3y4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yz0z3o5bai29y89s3y4.png" alt="S2I Example, source: KubeSphere" width="800" height="322"&gt;&lt;/a&gt;&lt;em&gt;S2I Example, source: KubeSphere&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy Procedure
&lt;/h3&gt;

&lt;h4&gt;
  
  
  PHP Application
&lt;/h4&gt;

&lt;p&gt;The first thing to do is create a new project. In this case, you should login to your Openshift Web UI and choose the Developer role. Then click topology and in the project dropdown, choose Create Project and fill it with your project name. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8gt74hm0ufxwg56so0h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8gt74hm0ufxwg56so0h.png" alt="alt text" width="694" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the topology is created, then the topology page would look like this where you could choose a deployment source, like from Git, Dockerfile, YAML, etc. In this case, I will use deployment from Git.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0h9ppjgky5gur2ao6cyk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0h9ppjgky5gur2ao6cyk.png" alt="alt text" width="694" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After clicking deployment source from Git, the first thing that appears is the text box to input our GitHub repository link. Openshift will choose the suitable builder image based on our source code. Since most of my code is written in PHP, Openshift chose PHP as my builder image. Then choose the PHP version that you want to be used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsxxf46rbjfxujttghms7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsxxf46rbjfxujttghms7.png" alt="alt text" width="736" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lastly, give the application a unique name, select Deployment Config as the resource type, check the "Create route to the application" checkbox, and then click Create. After this, Openshift will deploy the PHP application for us.&lt;/p&gt;

&lt;h4&gt;
  
  
  MySQL Application
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpnca1iitbjsdcmntdo98.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpnca1iitbjsdcmntdo98.png" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next application that will be deployed is the database. To deploy a database easily, I will use the same method as PHP before. First click the +Add button on the side menu, then choose Database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm24vbcnnkkd7mfpxxduh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm24vbcnnkkd7mfpxxduh.png" alt="alt text" width="689" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After clicking the database, on the next page we need to choose a database template. In this case, I choose MySQL, and then a new page will appear where we need to fill in our desired values like for database storage, namespace, service name, username, password, database name, volume capacity, and database version. After that, click Create and Openshift will create a MySQL Application.&lt;/p&gt;

&lt;h4&gt;
  
  
  Connecting both application
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcopymhc8uzf934zj7va0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcopymhc8uzf934zj7va0.png" alt="alt text" width="575" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both the PHP and MySQL applications were not directly connected after we deployed them, so we need to connect them both. The topology of both applications will be as shown in the image above. In my source code, I created a PHP connection file with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$host=getenv("databaseip");
$port=getenv("databaseport");
$user=getenv("databaseuser");
$pass=getenv("databasepassword");
$db=getenv("databasename");

$mysqli=mysqli_connect($host,$user,$pass,$db);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That block of code is used to fetch the database IP, port, user, password, and database name from the database application that was listed in the PHP environment variable. Hence, we need to input those variables inside the PHP's pod environment variable (you need to know your database pod ip; in this case, I logged in to Red Hat VM and then used get svc to get the database pod ip). The result will look like the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c4y0paj5k463fhqnsbn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c4y0paj5k463fhqnsbn.png" alt="alt text" width="677" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After filling in those database variables, click save and rollback the PHP application. After that, click the "Open URL" button in the top right of the PHP application and you will be redirected to the PHP application. The result will look like this in my project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhoy1bynkqlz669zsk7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhoy1bynkqlz669zsk7j.png" alt="alt text" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are some problems that still occur in my project. The problems are the authentication method for the database and that no table named "Proker" exists in the database. To fix the first problem, we need to login with the root account (usually by using &lt;code&gt;mysql -u root&lt;/code&gt; and then input your password) and then enter the code below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ALTER USER 'username'@'%' IDENTIFIED WITH mysql_native_password BY 'password';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why use % and not localhost? because it won't work since using username@localhost will only allow access from localhost. But using username@% will access all locations that are available, hence we can connect to our database IP.&lt;/p&gt;

&lt;p&gt;Next is to create a table for the database. This one is simple. We can solve this by creating a simple table like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE proker(
    nomorProgram INT NOT NULL,
    namaProgram VARCHAR(32) NOT NULL,
    suratKeterangan VARCHAR(32) NOT NULL,
    PRIMARY KEY (nomorProgram)
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqpw4mxcph99q6nwb2cs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqpw4mxcph99q6nwb2cs.png" alt="alt text" width="800" height="238"&gt;&lt;/a&gt;&lt;em&gt;When the table is still empty&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2oev3ghir3e6ddb31wbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2oev3ghir3e6ddb31wbe.png" alt="alt text" width="800" height="238"&gt;&lt;/a&gt;&lt;em&gt;The table after adding some new rows&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After that, we can try logging in again to my page and the main page will generate this kind of table, where the table is empty but we can add new value to the database just like in the image above. And by that way, my project ends, and thank you for taking the time to read my article.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sorry for the bad image resolution. Since I can't log in anymore to my Red Hat account because it can only be accessed during the course, I have to crop the image from my final project report.&lt;em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>opensource</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Streaming Twitter Data and Scheduling</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Fri, 25 Feb 2022 12:14:55 +0000</pubDate>
      <link>https://dev.to/aldifp01/streaming-twitter-data-and-scheduling-5dfn</link>
      <guid>https://dev.to/aldifp01/streaming-twitter-data-and-scheduling-5dfn</guid>
      <description>&lt;p&gt;In this post, I will share my data engineering course final project from the previous semester. The task was to create an ETL process for streaming data. The streamed data sources are from Twitter and fetched using Kafka. There are some additional tasks, like using Airflow for scheduling all tasks and creating a dashboard. in this project we stream data to fetch tweet about Windows 11. So the output of this project after Stream and ETL process is a sentiment analysis about Windows 11.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Kafka
&lt;/h2&gt;

&lt;p&gt;Before install Kafka, make sure that you have Java version 1.8.0 and scala. Then head to &lt;a href="https://kafka.apache.org/downloads" rel="noopener noreferrer"&gt;https://kafka.apache.org/downloads&lt;/a&gt; to download the installation file and choose the file that has the same scala version that has been installed in your computer, because my laptop has Scala with 2.12 version so we going to download Kafka with version 3.0.0. Installing Kafka is pretty easy, you just need to extract the file somewhere like in home directory. After extracting the file, change the snapshot directory in zookepeer.properties and log directory in server.properties to your kafka installation like this.&lt;/p&gt;

&lt;p&gt;for snapshot directory&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dataDir=/home/USER/kafka/data/zookeeper 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for log directory&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log.dirs=/home/USER/kafka/data/kafka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding both directories, you can use Kafka right away. You can add Kafka to your path environment variable for easier use. Additionally, if your Kafka broker won't run, remove the listener that is being commented in the server.properties like the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzpm9tvhvmwl57s5g1nyj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzpm9tvhvmwl57s5g1nyj.png" alt="Listener" width="630" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All Kafka installation is now completed, now you can run both Zookeeper Server and Kafka Server by using command below (make sure you run zookeeper first).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;zookeeper-server-start.sh ../../config/zookeeper.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kafka-server-start.sh ../../config/server.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing Airflow
&lt;/h2&gt;

&lt;p&gt;The reason we used Linux for this project is that installing Airflow is somehow harder on Windows. To install Airflow, first make sure you already have Python-pip. After that, you can install Airflow right away by running "pip install apache-airflow" and waiting until the installation is completed. You can add airflow to your environment variable like Kafka did before.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5btrpb04b3oq73g0own.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5btrpb04b3oq73g0own.png" alt="Environment Variable" width="540" height="103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After installation is completed, create a new database by using the command &lt;code&gt;airflow db init&lt;/code&gt;. This command is only used once, the first time we use Airflow. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqnn3o30scadwemfm1tu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqnn3o30scadwemfm1tu.png" alt="Initialization Airflow DB" width="800" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, we can use airflow. To use Airflow, you need to open two different consoles and run &lt;code&gt;airflow webserver-p 8080&lt;/code&gt; and &lt;code&gt;airflow scheduler&lt;/code&gt; in each console, like in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwly44ritzrnmeu63eu4x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwly44ritzrnmeu63eu4x.png" alt="Airflow Webserver" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfpmgz30p5exmmmzwvav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfpmgz30p5exmmmzwvav.png" alt="Airflow Scheduler" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you can access Airflow at localhost:8080, but you need credentials first to access Airflow. To do this, you can create a new account by running &lt;code&gt;airflow users create [-h]". -e EMAIL-f FIRSTNAME-l LASTNAME [-p PASSWORD] -r ROLE [--random-password] -u USERNAME&lt;/code&gt;. Then login to Airflow with the account you have just created. After you login, the homepage will look like the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz08ruau0sjp16spsn7ze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz08ruau0sjp16spsn7ze.png" alt="Airflow Homepage" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming data and Simple ETL
&lt;/h2&gt;

&lt;p&gt;Now we are on the main subject, streaming data and ETL process. First you need to create a topic by running the command &lt;code&gt;kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic windows_11&lt;/code&gt;. Next, we need to create several python files for the simple ETL process since we are going to schedule every python program by using Airflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stream
&lt;/h3&gt;

&lt;p&gt;To stream data, we create a python file. Since we are going to fetch data from Twitter, make sure you have a Twitter developer account to get access tokens and keys. To stream data, you can use the blocks of code below (change access, consumer key and secret to yours). After that, save the code into a python file named stream.py.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# import library
import tweepy
from kafka import KafkaProducer
from datetime import datetime, timedelta
import csv
import pandas as pd

# Twitter Consumer and Access
consumer_key = YOUR_KEY
consumer_secret = YOUR_KEY_SECRET
access_token = YOUR_ACCESS_TOKEN
access_token_secret = YOUR_ACCESS_TOKEN_SECRET

# setup authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# instantiation API
api = tweepy.API(auth, wait_on_rate_limit=True)

# Adjust time to local time (GMT +7 / UTC +7)
def normalize_time(time):
    mytime = datetime.strptime(time, "%Y-%m-%d %H:%M:%S")
    mytime += timedelta(hours=7)
    return (mytime.strftime("%Y-%m-%d %H:%M:%S"))

# instantiation of Kafka Producer
producer = KafkaProducer(bootstrap_servers=['localhost:9092'], api_version=(0, 10, 1))

# Topic initialization, keyword, and max query
topic_name = 'windows_11'
search_key = "windows 11"
maxId = -1
maxTweets = 3000
tweetCount = 0
tweetsPerQuery = 500

# Csv declaration, change path to yours
csvFile = open("/home/stndb01/Documents/Data_Engineering/Proyek/"+search_key+".csv", "a+", newline="", encoding="utf-8")
csvWriter = csv.writer(csvFile)

# List declaration
tweet_id = []
tweet_username = []
tweet_text = []

# Looping to fetch tweet by using twitter API until specified limit
while tweetCount &amp;lt; maxTweets:
    # getting first tweet data
    if maxId &amp;lt;= 0:
        # newTweets = api.search_tweets(q=search_key, lang="en", count=tweetCount, max_id=maxId)
        newTweets = api.search_tweets(q=search_key, lang="en", count=tweetCount)
    # getting data from second tweet kedua and forth
    newTweets = api.search_tweets(q=search_key, lang="en", count=tweetsPerQuery)

    # getting certain attribute from tweet
    for i in newTweets:
        record = str(i.user.id_str)
        record += ';'
        record = str(i.user.name)
        record += ';'
        # record += str(normalize_timestamp(str(i.created_at)))
        # record += ';'
        # record += str(i.full_text.encode('utf-8'))
        record += str(i.text.encode('utf-8'))
        record += ';'
        print(str.encode(record))
        producer.send(topic_name, str.encode(record))

        tweet_id.append(str(i.user.id_str))
        tweet_username.append(str(i.user.name))
        tweet_text.append(str(i.text.encode('utf-8')))
        tweets = [str(i.user.id_str), str(i.user.name), str(i.text.encode('utf-8'))]
        csvWriter.writerow(tweets)

    # adding TweetCount and MaxId
    tweetCount += len(newTweets)
    maxId = newTweets[-1].id

dictTweets = {"id":tweet_id, "username":tweet_username, "text":tweet_text}
df = pd.DataFrame(dictTweets)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extract
&lt;/h3&gt;

&lt;p&gt;The next step is to extract the data from the csv file we've just created. In extract step we extract data from csv to a new temporary csv with prefix _e, this csv later will be deleted in future step. Save the code below into new python file named extract.py.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
kolom = ['id', 'username', 'tweet']
df_windows = pd.read_csv('/home/stndb01/Documents/Data_Engineering/Proyek/windows 11.csv', names = kolom)

df_windows.to_csv('/home/stndb01/Documents/Data_Engineering/Proyek/windows_11_e.csv')

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Transform
&lt;/h3&gt;

&lt;p&gt;The csv file that we've just extracted contains duplicated data. Then, in this step, we will transform the data by removing duplicated data. The code below will use the previous windows_11_e.csv to delete duplicated data and save it to a new csv called windows_11_t.csv. Save the code below into a new python file named transform.py.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
kolom = ['id', 'username', 'tweet']
df_windows = pd.read_csv('/home/stndb01/Documents/Data_Engineering/Proyek/windows_11_e.csv', names = kolom)
#df_windows.duplicated().value_counts()


df_windows.drop_duplicates(inplace=True, ignore_index=True)
df_windows
df_windows.to_csv('/home/stndb01/Documents/Data_Engineering/Proyek/windows_11_t.csv')

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Load
&lt;/h3&gt;

&lt;p&gt;After we transform the data, we load the data into a local database. For this project, we used SQLite3 for the database. The code below is used to load data. This code also deletes both the temporary CSV files that were created in the extract and transform steps. Save the code below into a new python file named load.py.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sqlite3 as s3
import pandas as pd
import os

# creating connection to database
conn = s3.connect('/home/stndb01/Documents/Data_Engineering/Proyek/windows11_data.db')

# Object cursor to run SQL query
cur = conn.cursor()

# load dataframe to datamart
df_windows = pd.read_csv('/home/stndb01/Documents/Data_Engineering/Proyek/windows_11_t.csv')
df_windows.to_sql('windows11_table',conn,if_exists='replace',index=False)
os.remove("/home/stndb01/Documents/Data_Engineering/Proyek/windows_11_e.csv")
os.remove("/home/stndb01/Documents/Data_Engineering/Proyek/windows_11_t.csv")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Scheduling
&lt;/h2&gt;

&lt;p&gt;The files for the Stream and ETL processes have been created. Then we can create a schedule for those tasks. By creating a new Python file, we are going to create a DAG, or Directed Acyclic Graph, for every task. The DAG code looks like the code below. Save the code below into a new python file named scheduling-proyek.py.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from datetime import timedelta
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
from airflow import DAG

# insert your airflow account 
default_args = {
    'owner': 'admin',
    'depends_on_past': False,
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
}
with DAG(
    'Proyek_Akhir',
    default_args=default_args,
    description='Scheduling untuk task proyek akhir',
    schedule_interval=timedelta(days=1),
    start_date=days_ago(2),
    tags=['proyek'],
) as dag:

    stream        = BashOperator(
                    task_id='Stream_data_twitter',
                    bash_command='python /home/stndb01/Documents/Data_Engineering/Proyek/stream.py'
                    #bash_command='python stream.py',
                    )

    extract        = BashOperator(
                    task_id='Extract_data_twitter',
                    bash_command='python /home/stndb01/Documents/Data_Engineering/Proyek/extract.py'
                    #bash_command='python extract.py',
                    )

    transform        = BashOperator(
                    task_id='Transform_data_twitter',
                    bash_command='python /home/stndb01/Documents/Data_Engineering/Proyek/transform.py'
                    #bash_command='python transform.py',
                    )

    load        = BashOperator(
                    task_id='Load_data_twitter',
                    bash_command='python /home/stndb01/Documents/Data_Engineering/Proyek/load.py'
                    #bash_command='python load.py',
                    )

    # create the execute order, set_downstream mean we execute the 
    # left one and then the right one
    stream.set_downstream(extract)
    extract.set_downstream(transform)
    transform.set_downstream(load)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you save the file, move the scheduling file to your Airflow installation folder. Inside you will see a DAG folder. Move the scheduling file to that folder and run &lt;code&gt;python scheduling-proyek.py&lt;/code&gt;. Then return to the Airflow homepage and you will see a new DAG for your project, like in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpryc3gg25x0mp7i6sakk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpryc3gg25x0mp7i6sakk.png" alt="New DAG" width="800" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, click the DAG and you will be redirected to a new page that contains a tree graph based on how you queued all your tasks. Click the play button and then the task will run as you set the interval. Since the start date was set 2 days ago and timedelta was set to 1, all tasks will be run in order and every task that has been completed will be marked with green. The result will look like the image below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx51otgj9p2r8q6ku7tdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx51otgj9p2r8q6ku7tdg.png" alt="DAG Complete" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sentiment Analysis
&lt;/h2&gt;

&lt;p&gt;To do a sentiment analysis, we load the data that has been saved to SQLite database. The code and result would be like these.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Creating connection to database
conn = s3.connect('windows11_data.db')

# creating cursor
cur = conn.cursor()
query = '''select * from windows11_table'''
df = pd.read_sql(query, conn)
df
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjekj09p0r2dnrni1gn5w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjekj09p0r2dnrni1gn5w.png" alt="Load Result" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since we only deleted duplicated tweets in the transform process before, there are still some other things that are not really fixed that can cause problems in analysis, such as emoji, url links, symbols, and etc. Because those items are saved in byte format, they can interfere with analysis.So we used the regex function to delete them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def preprocess_text(review):
    review = codecs.decode(review, 'unicode_escape') #remove escape character
    review = review[2:-1]
    review = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL', review)
    review = re.sub('[^\x00-\x7f]', '', review) 
    review = re.sub('@[^\s]+','USER', review)
    review = review.lower().replace("ё", "е")
    review = re.sub('[^a-zA-Zа-яА-Я1-9]+', ' ', review)
    review = re.sub(' +',' ', review)
    return review.strip()
df['tweet'] = [preprocess_text(review) for review in df['tweet']]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, the data should be okay to be analyzed. Next, we created a new function by using the textblob library to find subjectivity, polarity, and analysis. The result will be added to the new column in the previous dataframe and from that we will know which tweet can be considered a positive, negative, or even neutral review of the Windows 11 launch. The code and result would be as below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from textblob import TextBlob

def sentiment_analysis(data):
  def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

  #Create a function to get the polarity
  def getPolarity(text):
    return TextBlob(text).sentiment.polarity

  def getAnalysis(score):
    if score &amp;lt; 0:
      return "Negative"
    elif score == 0:
      return "Neutral"
    else:
      return "Positive"

  #Create two new columns 'Subjectivity' &amp;amp; 'Polarity'
  data["TextBlob_Subjectivity"] = data["tweet"].apply(getSubjectivity)
  data["TextBlob_Polarity"] = data["tweet"].apply(getPolarity)

  data["TextBlob_Analysis"] = data["TextBlob_Polarity"].apply(getAnalysis)
  return data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flso3667o2smsvtrjrvzc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flso3667o2smsvtrjrvzc.png" alt="Subjectivity and polarity" width="465" height="704"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next code is an optional step. We count the top 20 words that appear frequently, and that would also be displayed with the sentiment analysis result later.&lt;br&gt;
&lt;br&gt;
 &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from collections import Counter

def getMostFrequentWord(df):
    most_freq = Counter(" ".join(df["tweet"]).split()).most_common(30)

    most_freq_filtered = {}
    for i in most_freq:
        if(i[0] == 'user' or i[0] == 'url' or i[0] == 'rt'):
            continue
        else:
            most_freq_filtered[i[0]] = i[1]
            if(len(most_freq_filtered) == 20): # ambil 20 kata paling banyak muncul
                break

    return most_freq_filtered

#Calling the function
most_freq = getMostFrequentWord(df)
most_freq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Dashboard
&lt;/h2&gt;

&lt;p&gt;The last step in this project is creating a dashboard that will display the sentiment analysis results in a pie chart and also show the top 20 words that appear frequently. The code and result would look like below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;f, axes = plt.subplots(1, 2, figsize = (15, 7))
# sns.set_style("white")
colors = sns.color_palette('pastel')[0:5]
# colors = sns.color_palette('bright')[0:5]
f.suptitle("{} Twitter Topic Analysis".format(TOPIC), fontsize=20)
f.tight_layout(pad=3.0)

# figure 1
ax1 = plt.subplot2grid((1,2),(0,0))
ax1.title.set_text("{} Tweet Sentiment".format(TOPIC))
fig1 = plt.pie(df.groupby("TextBlob_Analysis").size(), labels = df.groupby("TextBlob_Analysis").size().index, colors = colors, autopct='%.0f%%')

# figure 2
ax2 = plt.subplot2grid((1,2),(0,1))
ax2.title.set_text("{} Tweet Most Common Words".format(TOPIC))
fig2 = sns.barplot(x=list(most_freq.keys()), y=list(most_freq.values()))
fig2.set(xlabel='Most Common Words', ylabel='Frequency')
for item in fig2.get_xticklabels():
    item.set_rotation(90)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmbbxebubrn8mtmr2iqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmbbxebubrn8mtmr2iqh.png" alt="dashboard" width="488" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://medium.com/@jacksonbull1987/how-to-install-apache-airflow-6b8a2ae60050" rel="noopener noreferrer"&gt;How to install Apache Airflow&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.bmc.com/blogs/working-streaming-twitter-data-using-kafka/" rel="noopener noreferrer"&gt;Working with streaming Twitter data using Kafka&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-test-your-data-with-great-expectations" rel="noopener noreferrer"&gt;How to test your data with Great Expectations&lt;/a&gt;&lt;br&gt;
&lt;a href="https://towardsdatascience.com/my-absolute-go-to-for-sentiment-analysis-textblob-3ac3a11d524" rel="noopener noreferrer"&gt;Sentiment Analysis using TextBlob&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.analyticsvidhya.com/blog/2020/04/beginners-guide-exploratory-data-analysis-text-data" rel="noopener noreferrer"&gt;Exploratory data analysis for text data: EDA using python&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Wiki.js untuk Organisasi</title>
      <dc:creator>Aldi Fianda Putra</dc:creator>
      <pubDate>Sun, 29 Aug 2021 12:05:05 +0000</pubDate>
      <link>https://dev.to/aldifp01/wiki-js-untuk-organisasi-1bbc</link>
      <guid>https://dev.to/aldifp01/wiki-js-untuk-organisasi-1bbc</guid>
      <description>&lt;h2&gt;
  
  
  Apa itu Wiki.js
&lt;/h2&gt;

&lt;p&gt;Wiki.js adalah sebuah aplikasi wiki yang open source yang berjalan pada Node.js dan ditulis dengan bahasa javascript. Pada wiki.js konten ditulis dalam file markdown. Wiki.js menggunakan git untuk menyimpan konten yang akan dipublish. &lt;/p&gt;

&lt;h2&gt;
  
  
  System Requirement untuk Wiki.js
&lt;/h2&gt;

&lt;p&gt;Pada postingan ini, saya akan menampilkan prosedur instalasi wiki.js versi 2.x. Dengan demikian berikut adalah service yang harus diinstall terlebih dahulu&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js versi 10.12 atau yang lebih baru&lt;/li&gt;
&lt;li&gt;Database seperti Postgresql, mariadb, mysql dan lain-lain&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prosedur Instalasi
&lt;/h2&gt;

&lt;p&gt;Pada proses instalasi wiki.js kali ini, saya akan memasang wiki.js pada perangkat linux dengan menggunakan docker compose, sehingga Node.js dan database tidak perlu diinstall terlebih dahulu. Berikut konfigurasi file docker-compose.yml untuk penginstalasian wiki.js pada linux.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: "3"
services:

  db:
    image: postgres:11-alpine
    environment:
      POSTGRES_DB: wiki
      POSTGRES_PASSWORD: wikijsrocks
      POSTGRES_USER: wikijs
    logging:
      driver: "none"
    restart: unless-stopped
    volumes:
      - db-data:/var/lib/postgresql/data

  wiki:
    image: requarks/wiki:2
    depends_on:
      - db
    environment:
      DB_TYPE: postgres
      DB_HOST: db
      DB_PORT: 5432
      DB_USER: wikijs
      DB_PASS: wikijsrocks
      DB_NAME: wiki
    restart: unless-stopped
    ports:
      - "3000:3000"

volumes:
  db-data:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pada konfigurasi file docker-compose.yml diatas adalah konfigurasi default yang dapat diperoleh pada halaman wiki.js, namun pada untuk port saya ubah menjadi 3000:3000 karena port 80 telah saya gunakan. Setelah konfigurasi dirasa sudah cukup maka kita dapat menjalankan perintah berikut pada direktori yang sama dengan file yml untuk menjalankan servis wikijs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setelah menjalankan perintah diatas, maka proses pemasangan wiki.js akan dimulai dan dapat ditunggu untuk beberapa saat. kemudian apabila tampilan berikut sudah muncul maka wiki.js dapat kita akses melalui browser.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fterminal-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fterminal-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Berdasarkan tampilan terminal sebelumnya, kita diminta untuk mengakses halaman &lt;a href="http://YOUR-SERVER-IP:3000/" rel="noopener noreferrer"&gt;http://YOUR-SERVER-IP:3000/&lt;/a&gt; untuk melanjutkan proses setup. Karena wiki.js dipasang pada komputer sendiri maka alamat yang diakses adalah &lt;a href="https://127.0.0.1:3000" rel="noopener noreferrer"&gt;https://127.0.0.1:3000&lt;/a&gt; atau localhost:3000 dan diperoleh tampilan sebagai berikut.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Finstall-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Finstall-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Ketika tampilan localhost:3000 sudah seperti diatas maka proses pemasangan wikijs sudah berhasil. Berikutnya kita dapat memulai melakukan registrasi akun pada halaman tersebut. Terdapat juga fitur untuk membuat site url pada halaman registrasi. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Flogin-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Flogin-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Tampilan diatas akan muncul ketika kita sudah melakukan registrasi, akan muncul halaman untuk login dengan menggunakan kredensial yang telah kita buat sebelumnya.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fwelcome-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fwelcome-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Tampilan diatas akan muncul ketika kita pertama kali login dan tidak ada konten sama sekali. Sehingga kita dapat menekan tombol "Create Home Page" untuk membuat halaman baru.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fcreate-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fcreate-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Pada pembuatan konten pada wiki.js, terdapat berbagai cara seperti dengan menggunakan markdown layaknya konten pada halaman dev.to ini ataupun dengan menggunakan html dan rich text format. Pada pembuatan konten pertama kali kita diminta juga untuk menentukan direktori penyimpanan konten yang mana secara default berada pada direktori /en/. Selain itu ketika kita membuat konten, kita dapat juga membuat tag sehingga pembaca dapat mencari konten lain yang relevan dengan konten yang sekarang dibuat melalui tag yang diberikan.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fthread-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fthread-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Pada postingan ini saya mencontohkan membuat konten dengan menggunakan markdown. Caranya kurang lebih sama dengan pembuatan konten pada halaman dev.to seperti tampilan diatas. Tampilan sebelah kiri adalah bagian Edit untuk menuliskan konten dan sebelah kanan adalah tampilan Preview dari konten yang kita tulis. Setelah konten sudah selesai dibuat maka kita dapat menekan tombol Create pada bagian atas untuk membuat halaman baru tersebut.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fresult-wikijs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fstndb01%2Flinux%2Fblob%2Fmaster%2Fscreenshot%2Fwikijsdevto%2Fresult-wikijs.png%3Fraw%3Dtrue" alt="alt text" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Tampilan diatas adalah tampilan halaman yang telah kita buat sebelumnya. Halaman yang kita buat tersebut adalah halaman utama atau main page. Apabila kita ingin membuat halaman baru maka cukup menekan tombol "New Page" pada sudut kanan atas dan melakukan prosedur yang sama seperti pembuatan konten pertama kali, dengan demikian halaman wiki.js sudah dapat digunakan dan dimanfaatkan untuk pembuatan konten organisasi.&lt;/p&gt;

&lt;p&gt;Demikian artikel mengenai pembuatan konten untuk organisasi dengan wiki.js ini saya sampaikan. Mohon koreksi apabila terdapat kesalahan dan semoga konten yang saya sampaikan ini bermanfaat. Terima kasih.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
