<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: LuxDevHQ</title>
    <description>The latest articles on DEV Community by LuxDevHQ (@luxdevhq).</description>
    <link>https://dev.to/luxdevhq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F4798%2Fdb1d01d7-dfac-4ccd-8ad8-a8ef16600073.jpg</url>
      <title>DEV Community: LuxDevHQ</title>
      <link>https://dev.to/luxdevhq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/luxdevhq"/>
    <language>en</language>
    <item>
      <title>Supermarket Sales and Customer Insights Dashboard — A Practical Power BI Project Guide.</title>
      <dc:creator>Mwenda Harun Mbaabu</dc:creator>
      <pubDate>Wed, 04 Feb 2026 12:21:21 +0000</pubDate>
      <link>https://dev.to/luxdevhq/supermarket-sales-and-customer-insights-dashboard-a-practical-power-bi-project-guide-4o7p</link>
      <guid>https://dev.to/luxdevhq/supermarket-sales-and-customer-insights-dashboard-a-practical-power-bi-project-guide-4o7p</guid>
      <description>&lt;p&gt;This technical article walks you step by step through a &lt;strong&gt;beginner-friendly Power BI project&lt;/strong&gt; using a real-world supermarket transactions dataset. By the end of this guide, you will know &lt;strong&gt;where to download the data&lt;/strong&gt;, &lt;strong&gt;how to prepare it&lt;/strong&gt;, and &lt;strong&gt;how to build an interactive dashboard&lt;/strong&gt; that answers real business questions.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Project Overview&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In this project, you will analyze supermarket transaction data and transform it into an &lt;strong&gt;interactive Power BI dashboard&lt;/strong&gt;. The focus is not just on visuals, but on &lt;strong&gt;answering business questions clearly and professionally&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You will act as a &lt;strong&gt;Junior Data Analyst&lt;/strong&gt;, converting raw transaction records into insights that business stakeholders can explore without using spreadsheets.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Dataset Download&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You can download the dataset to be used in this project &lt;a href="https://github.com/LuxDevHQ/Data0" rel="noopener noreferrer"&gt;here&lt;/a&gt;, &lt;a href="https://github.com/LuxDevHQ/Data" rel="noopener noreferrer"&gt;https://github.com/LuxDevHQ/Data&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dataset contents:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Three years of supermarket transaction data
&lt;/li&gt;
&lt;li&gt;Multiple store locations (Australia)
&lt;/li&gt;
&lt;li&gt;Individual transaction-level records
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Columns include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product Name
&lt;/li&gt;
&lt;li&gt;Quantity Sold
&lt;/li&gt;
&lt;li&gt;Total Sales Amount
&lt;/li&gt;
&lt;li&gt;Payment Method
&lt;/li&gt;
&lt;li&gt;Customer Type (Member / Non-Member)
&lt;/li&gt;
&lt;li&gt;Store Location
&lt;/li&gt;
&lt;li&gt;Transaction Date
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;&lt;a href="https://github.com/LuxDevHQ/Data0" rel="noopener noreferrer"&gt;Download the dataset&lt;/a&gt;&lt;/strong&gt;  &lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Business Questions This Project Answers&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before opening Power BI, it is important to understand &lt;strong&gt;what questions the dashboard should answer&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Sales Performance&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What is the total sales amount across all stores?&lt;/li&gt;
&lt;li&gt;How do sales trend over time (monthly and yearly)?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Product Analysis&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Which products generate the highest revenue?&lt;/li&gt;
&lt;li&gt;How do apple sales compare across different payment methods?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Customer Behavior&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;How much do members vs non-members spend?&lt;/li&gt;
&lt;li&gt;Which customer type contributes more to total revenue?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Payment Method Insights&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Which payment method is used most frequently?&lt;/li&gt;
&lt;li&gt;How does revenue differ by payment method?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Store Performance&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Which store location generates the highest sales?&lt;/li&gt;
&lt;li&gt;How does customer behavior vary by store?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions will guide every step of the analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Step 1: Load the Data into Power BI&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open &lt;strong&gt;Power BI Desktop&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Get Data → Text/CSV&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;supermarket_transactions.csv&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Load the data into Power BI&lt;/li&gt;
&lt;li&gt;Review column names and preview the data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this stage, do not build visuals yet. First, ensure the data is correct.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Step 2: Data Cleaning in Power Query&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Open &lt;strong&gt;Transform Data&lt;/strong&gt; to enter Power Query.&lt;/p&gt;

&lt;p&gt;Perform the following actions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove unnecessary or duplicate columns&lt;/li&gt;
&lt;li&gt;Fix incorrect data types:

&lt;ul&gt;
&lt;li&gt;Dates → Date&lt;/li&gt;
&lt;li&gt;Sales &amp;amp; Quantity → Decimal / Whole Number&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Rename columns for clarity (e.g. &lt;code&gt;Total Sales Amount&lt;/code&gt;)&lt;/li&gt;

&lt;li&gt;Check for missing or inconsistent values&lt;/li&gt;

&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Clean data is critical. Poor data quality leads to misleading dashboards.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Step 3: Data Modeling&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once the data is clean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confirm all columns have correct data types&lt;/li&gt;
&lt;li&gt;Ensure the table structure is logical&lt;/li&gt;
&lt;li&gt;No complex relationships are required for this project (single-table model)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project focuses on &lt;strong&gt;analysis and visualization&lt;/strong&gt;, not complex modeling.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Create Beginner-Level DAX Measures.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Create the following measures in &lt;strong&gt;Model view&lt;/strong&gt; or &lt;strong&gt;Report view&lt;/strong&gt;:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
DAX
Total Sales =
SUM(supermarket_transactions[Total Sales Amount])

Total Quantity Sold =
SUM(supermarket_transactions[Quantity])

Average Transaction Value =
AVERAGE(supermarket_transactions[Total Sales Amount])

Sales by Customer Type =
SUM(supermarket_transactions[Total Sales Amount])


&amp;gt; ⚠️  These measures will power your KPI cards and charts. 

## **Step 5: Build the Power BI Dashboard**

Create a **1–2 page interactive Power BI dashboard** using the visuals listed below. The dashboard should be designed for **business users**, not technical users.

---

### **Required Visuals**

#### **KPI Cards**
- Total Sales  
- Total Quantity Sold  
- Average Transaction Value  

---

#### **Charts**
- **Bar Chart:** Sales by Product  
- **Bar Chart:** Sales by Store Location  
- **Pie or Column Chart:** Payment Method Distribution  
- **Line Chart:** Sales Trend Over Time  

---

#### **Slicers**
- Store Location  
- Product  
- Customer Type  
- Date  

&amp;gt; 🎯 **Design Principle:**  
&amp;gt; The goal is **clarity, not decoration**. Every visual should answer a specific business question.

---

## **Step 6: Validate Your Results**

Before submitting your work, verify the following:

- Confirm all totals match your **Excel analysis**
- Test all slicers and filters for correct behavior
- Check visual titles, labels, and number formatting
- Ensure visuals respond correctly to user interactions

---

### **Your Final Submission Should Include**

- Record a **4-minute walkthrough video** using **Loom**
- The video should demonstrate a **fully functional Power BI dashboard**
- Briefly explain:
  - The dataset used
  - Key visuals and filters
  - Main business insights and conclusions
- Upload the recording to **Loom** and copy the shareable link
- Submit the **Loom video link via WhatsApp** to **0796 448 232**

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
    </item>
    <item>
      <title>A Step-by-Step Guide to Streaming Live Weather Data Using Apache Kafka and Apache Cassandra</title>
      <dc:creator>Nelson Sammy</dc:creator>
      <pubDate>Wed, 16 Apr 2025 03:31:52 +0000</pubDate>
      <link>https://dev.to/luxdevhq/a-step-by-step-guide-to-streaming-live-weather-data-using-apache-kafka-and-apache-cassandra-ep2</link>
      <guid>https://dev.to/luxdevhq/a-step-by-step-guide-to-streaming-live-weather-data-using-apache-kafka-and-apache-cassandra-ep2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fegg2c0w9mp35vusns6kj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fegg2c0w9mp35vusns6kj.png" alt="Weather Data using Kafka,Confluent" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Delivering real-time weather data is increasingly important for applications across logistics, travel, emergency services, and consumer tools. In this tutorial, we will build a real-time weather data streaming pipeline using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenWeatherMap API to fetch weather data&lt;/li&gt;
&lt;li&gt;Apache Kafka (via Confluent Cloud) for streaming&lt;/li&gt;
&lt;li&gt;Apache Cassandra (installed on a Linux machine) for scalable storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll implement this pipeline using Python, demonstrate practical setups, and include screenshots to guide you through each step.&lt;/p&gt;

&lt;p&gt;By the end, you'll have a running system where weather data is continuously fetched, streamed to Kafka, and written to Cassandra for querying and visualization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaes96vt2r97k41gaomb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaes96vt2r97k41gaomb.png" alt="Weather Data Architecture using Kafka,Confluent" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.8+&lt;/li&gt;
&lt;li&gt;Linux Machine&lt;/li&gt;
&lt;li&gt;Kafka cluster on Confluent Cloud&lt;/li&gt;
&lt;li&gt;OpenWeatherMap API key &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Set Up Kafka on Confluent Cloud
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to confluent.cloud&lt;/li&gt;
&lt;li&gt;Create an account (free tier available)&lt;/li&gt;
&lt;li&gt;Create a Kafka cluster&lt;/li&gt;
&lt;li&gt;Create a topic named weather-stream&lt;/li&gt;
&lt;li&gt;Generate an API Key and Secret&lt;/li&gt;
&lt;li&gt;Note the Bootstrap Server, API Key, and API Secret&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Install Cassandra on a Linux Machine
&lt;/h3&gt;

&lt;p&gt;Open your terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install openjdk-11-jdk -y

# Add Apache Cassandra repo
echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee /etc/apt/sources.list.d/cassandra.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -

sudo apt update
sudo apt install cassandra -y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start and verify Cassandra:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl enable cassandra
sudo systemctl start cassandra
nodetool status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Connect Cassandra to DBeaver (GUI Tool)
&lt;/h3&gt;

&lt;p&gt;DBeaver is a great visual interface for managing Cassandra.&lt;br&gt;
Steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install DBeaver&lt;/li&gt;
&lt;li&gt;Open DBeaver and click New Connection&lt;/li&gt;
&lt;li&gt;Select Apache Cassandra from the list&lt;/li&gt;
&lt;li&gt;Fill in the following:&lt;/li&gt;
&lt;li&gt;Host: 127.0.0.1&lt;/li&gt;
&lt;li&gt;Port: 9042&lt;/li&gt;
&lt;li&gt;Username: leave blank (default auth)&lt;/li&gt;
&lt;li&gt;Password: leave blank&lt;/li&gt;
&lt;li&gt;Click Test Connection — you should see a successful message
Save and connect — you can now browse your keyspaces, tables, and run CQL visually.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 4: Create the Cassandra Table
&lt;/h3&gt;

&lt;p&gt;Once connected (or in cqlsh), run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE KEYSPACE IF NOT EXISTS weather
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

USE weather;

CREATE TABLE IF NOT EXISTS weather_data (
    city TEXT,
    timestamp TIMESTAMP,
    temperature FLOAT,
    humidity INT,
    PRIMARY KEY (city, timestamp)
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This schema stores weather info per city, indexed by time.&lt;br&gt;
You can also run the above queries in DBeaver’s SQL editor.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Create Kafka Producer in Python
&lt;/h3&gt;

&lt;p&gt;Install Dependencies&lt;br&gt;
&lt;code&gt;pip install requests confluent-kafka python-dotenv&lt;/code&gt;&lt;br&gt;
Create a .env file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BOOTSTRAP_SERVERS=pkc-xyz.us-central1.gcp.confluent.cloud:9092
SASL_USERNAME=API_KEY
SASL_PASSWORD=API_SECRET
OPENWEATHER_API_KEY=YOUR_OPENWEATHER_API_KEY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python Script: weather_producer.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
import json
from confluent_kafka import Producer
import time
from dotenv import load_dotenv
import os

load_dotenv()

conf = {
    'bootstrap.servers': os.getenv("BOOTSTRAP_SERVERS"),
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'PLAIN',
    'sasl.username': os.getenv("SASL_USERNAME"),
    'sasl.password': os.getenv("SASL_PASSWORD")
}

producer = Producer(conf)
API_KEY = os.getenv("OPENWEATHER_API_KEY")
TOPIC = 'weather-stream'
CITIES = ["Nairobi", "Lagos", "Accra", "Cairo", "Cape Town", "Addis Ababa", "Dakar", "Kampala", "Algiers"]

def get_weather(city):
    url = f'https://api.openweathermap.org/data/2.5/weather?q={city}&amp;amp;appid={API_KEY}&amp;amp;units=metric'
    response = requests.get(url)
    return response.json()

def delivery_report(err, msg):
    if err is not None:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}")

while True:
    for city in CITIES:
        weather = get_weather(city)
        weather['city'] = city  # Attach city explicitly
        producer.produce(TOPIC, json.dumps(weather).encode('utf-8'), callback=delivery_report)
        producer.flush()
        time.sleep(2)  # This will prevent API rate limit
    time.sleep(60)  # Wait before the next full cycle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script loads credentials from .env, loops through several African cities, and sends weather data to your Kafka topic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Create Kafka Consumer in Python (Store Data in Cassandra)
&lt;/h3&gt;

&lt;p&gt;Install additional libraries:&lt;br&gt;
&lt;code&gt;pip install cassandra-driver&lt;/code&gt;&lt;br&gt;
Python Script: weather_consumer.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
from cassandra.cluster import Cluster
from confluent_kafka import Consumer
import os
from dotenv import load_dotenv

load_dotenv()

# Cassandra connection
cluster = Cluster(['127.0.0.1'])
session = cluster.connect()
session.set_keyspace('weather')

# Kafka configuration
conf = {
    'bootstrap.servers': os.getenv("BOOTSTRAP_SERVERS"),
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'PLAIN',
    'sasl.username': os.getenv("SASL_USERNAME"),
    'sasl.password': os.getenv("SASL_PASSWORD"),
    'group.id': 'weather-group',
    'auto.offset.reset': 'earliest'
}

consumer = Consumer(conf)
consumer.subscribe(['weather-stream'])

print("Listening for weather data...")

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    if msg.error():
        print(f"Consumer error: {msg.error()}")
        continue

    data = json.loads(msg.value().decode('utf-8'))
    try:
        session.execute(
            """
            INSERT INTO weather_data (city, timestamp, temperature, humidity)
            VALUES (%s, toTimestamp(now()), %s, %s)
            """,
            (data['city'], data['main']['temp'], data['main']['humidity'])
        )
        print(f"Stored data for {data['city']}")
    except Exception as e:
        print(f"Failed to insert data: {e}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This consumer listens to your Kafka topic, parses incoming messages, and stores them in the weather_data table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Querying Cassandra Data via DBeaver
&lt;/h3&gt;

&lt;p&gt;Once the consumer is running and data is flowing, open DBeaver and run a CQL query to verify the data:&lt;br&gt;
&lt;code&gt;SELECT * FROM weather.weather_data;&lt;/code&gt;&lt;br&gt;
You should now see rows of weather data streaming in from various African cities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion &amp;amp; Next Steps
&lt;/h2&gt;

&lt;p&gt;You’ve successfully built a real-time data pipeline using Python, Kafka, and Cassandra. Here’s a summary of what you’ve done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up Kafka via Confluent Cloud&lt;/li&gt;
&lt;li&gt;Pulled real-time weather data using OpenWeatherMap&lt;/li&gt;
&lt;li&gt;Streamed data to Kafka via a Python producer&lt;/li&gt;
&lt;li&gt;Consumed Kafka events and stored them in Cassandra&lt;/li&gt;
&lt;li&gt;Queried Cassandra data in DBeaver&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Suggested Enhancements:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add Weather Alerts: Trigger notifications if temperatures exceed a threshold&lt;/li&gt;
&lt;li&gt;Streamlit Dashboard: Build a live dashboard showing city-by-city weather updates&lt;/li&gt;
&lt;li&gt;Data Retention Policy: Expire older data using Cassandra TTL&lt;/li&gt;
&lt;li&gt;Dockerize the Project: For easier deployment &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dataengineering</category>
      <category>ai</category>
      <category>kafka</category>
    </item>
    <item>
      <title>Building an Automated Weather Data Pipeline with Apache Kafka and Cassandra</title>
      <dc:creator>Eric Katumo</dc:creator>
      <pubDate>Mon, 07 Apr 2025 10:02:16 +0000</pubDate>
      <link>https://dev.to/luxdevhq/building-an-automated-weather-data-pipeline-with-apache-kafka-and-cassandra-23m</link>
      <guid>https://dev.to/luxdevhq/building-an-automated-weather-data-pipeline-with-apache-kafka-and-cassandra-23m</guid>
      <description>&lt;p&gt;This article creates an end-to-end weather data pipeline that collects weather data from multiple African cities, processes it through Apache Kafka, and stores it in a Cassandra database for analysis. The infrastructure is hosted on Microsoft Azure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Weather data pipelines are essential components of modern environmental monitoring systems. They enable organizations to collect, process, and analyze meteorological information in real-time, facilitating better decision-making. This tutorial demonstrates how to build a simple yet robust weather data pipeline using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt; for data fetching and processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenWeatherMap API&lt;/strong&gt; as our data source&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Kafka&lt;/strong&gt; for handling real-time data streams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Cassandra&lt;/strong&gt; for scalable data storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Azure&lt;/strong&gt; for cloud infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end of this article, you will understand how these technologies are combined to construct a data pipeline that can be scaled to more complex use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Our weather data pipeline consists of the following workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Extraction&lt;/strong&gt;: Weather data is extracted for five African cities from the OpenWeatherMap API using the producer script&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Streaming&lt;/strong&gt;: Weather data is streamed into a Kafka topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Consumption&lt;/strong&gt;: A consumer script reads from the Kafka topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Storage&lt;/strong&gt;: The data is processed and stored in a Cassandra database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Query&lt;/strong&gt;: We can query the stored data using CQL (Cassandra Query Language)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Azure Infrastructure Setup
&lt;/h2&gt;

&lt;p&gt;This project leverages Microsoft Azure for hosting our components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Azure Virtual Machine&lt;/strong&gt;: Ubuntu 20.04 LTS server to run our Python scripts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Security Group&lt;/strong&gt;: Configured to allow necessary traffic for Kafka and Cassandra&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Managed Disks&lt;/strong&gt;: For persistent storage of Cassandra data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To set up an Azure VM for this project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Log in to the &lt;a href="https://portal.azure.com" rel="noopener noreferrer"&gt;Azure Portal&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create a new Virtual Machine with Ubuntu 20.04 LTS&lt;/li&gt;
&lt;li&gt;Select at least 2 vCPUs and 8GB RAM for optimal performance&lt;/li&gt;
&lt;li&gt;Configure networking to allow inbound SSH (port 22), Kafka (port 9092), and Cassandra (port 9042)&lt;/li&gt;
&lt;li&gt;Create and download SSH keys for secure access&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After provisioning, connect to your VM using SSH:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &lt;span class="nt"&gt;-i&lt;/span&gt; /path/to/your/key.pem azureuser@your-vm-ip-address
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting Up Confluent Cloud for Your Weather Data Pipeline
&lt;/h2&gt;

&lt;p&gt;Before diving into the implementation, we'll set up a managed Kafka environment on Confluent Cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a Confluent Cloud Account
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Visit &lt;a href="https://www.confluent.io/" rel="noopener noreferrer"&gt;Confluent.io&lt;/a&gt; and click on "Get Started Free" &lt;/li&gt;
&lt;li&gt;Complete the registration process by providing your email and setting up a password&lt;/li&gt;
&lt;li&gt;Verify your email address to activate your account&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Setting Up a Kafka Cluster
&lt;/h3&gt;

&lt;p&gt;Once logged in to Confluent Cloud:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on "Create cluster" in the dashboard&lt;/li&gt;
&lt;li&gt;Choose Azure as your cloud provider and select the region closest to your Azure VM&lt;/li&gt;
&lt;li&gt;Select the "Basic" cluster type for development purposes (you can upgrade later)&lt;/li&gt;
&lt;li&gt;Name your cluster (e.g., "weather-pipeline-cluster")&lt;/li&gt;
&lt;li&gt;Click "Launch cluster"&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Creating a Kafka Topic
&lt;/h3&gt;

&lt;p&gt;After your cluster is provisioned:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select your cluster from the dashboard&lt;/li&gt;
&lt;li&gt;Navigate to the "Topics" section in the left sidebar&lt;/li&gt;
&lt;li&gt;Click "Create topic"&lt;/li&gt;
&lt;li&gt;Enter "weather_data" as the topic name&lt;/li&gt;
&lt;li&gt;Set the number of partitions (start with 6 for this example)&lt;/li&gt;
&lt;li&gt;Leave the default retention settings (7 days)&lt;/li&gt;
&lt;li&gt;Click "Create with defaults" (or customize advanced settings if needed)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Creating API Keys for Authentication
&lt;/h3&gt;

&lt;p&gt;To connect your application to the Confluent Cloud Kafka cluster:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your cluster dashboard, click on "API keys" in the left sidebar&lt;/li&gt;
&lt;li&gt;Click "Create key"&lt;/li&gt;
&lt;li&gt;Select "Global access" (or "Granular access" if you prefer more control)&lt;/li&gt;
&lt;li&gt;Provide a description (e.g., "Weather Pipeline API Key")&lt;/li&gt;
&lt;li&gt;Click "Create key"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Important&lt;/strong&gt;: Save both the API key and secret in a secure location as they will only be shown once&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Testing the Connection
&lt;/h3&gt;

&lt;p&gt;To verify your connection settings, create a simple test script on your Azure VM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka.admin&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AdminClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Kafka configuration
&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BOOTSTRAP_SERVERS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security.protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SASL_SSL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.mechanisms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KAFKA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KAFKA_API_SECRET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create Admin client
&lt;/span&gt;&lt;span class="n"&gt;admin_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AdminClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# List topics
&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;admin_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_topics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Available topics:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting Up Cassandra on Azure VM
&lt;/h2&gt;

&lt;p&gt;We'll install Apache Cassandra directly on our Azure VM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Java&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; openjdk-8-jdk

&lt;span class="c"&gt;# Add the Apache Cassandra repository&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"deb https://downloads.apache.org/cassandra/debian 40x main"&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; /etc/apt/sources.list.d/cassandra.list
curl https://downloads.apache.org/cassandra/KEYS | &lt;span class="nb"&gt;sudo &lt;/span&gt;apt-key add -
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update

&lt;span class="c"&gt;# Install Cassandra&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; cassandra

&lt;span class="c"&gt;# Start the service&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start cassandra
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;cassandra

&lt;span class="c"&gt;# Verify Cassandra is running&lt;/span&gt;
nodetool status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure Cassandra to accept remote connections by editing &lt;code&gt;/etc/cassandra/cassandra.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/cassandra/cassandra.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change the following values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;listen_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;your-vm-ip-address&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;rpc_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0&lt;/span&gt;
&lt;span class="na"&gt;seeds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-vm-ip-address&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Cassandra to apply changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart cassandra
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting Up the Environment
&lt;/h2&gt;

&lt;p&gt;Now on your Azure VM, set up the development environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a project directory&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;weather_data_pipeline
&lt;span class="nb"&gt;cd &lt;/span&gt;weather_data_pipeline

&lt;span class="c"&gt;# Create and activate a virtual environment&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv myvenv
&lt;span class="nb"&gt;source &lt;/span&gt;myvenv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then install the required packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;confluent-kafka requests python-dotenv cassandra-driver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file to store your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WEATHER_API_KEY=your_openweathermap_api_key
BOOTSTRAP_SERVERS=your_confluent_bootstrap_servers
KAFKA_API_KEY=your_confluent_api_key
KAFKA_API_SECRET=your_confluent_api_secret
CASSANDRA_HOST=localhost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating the Producer Script
&lt;/h2&gt;

&lt;p&gt;The producer script is responsible for fetching weather data from the OpenWeatherMap API and sending it to a Kafka topic. Let's break down the key components of our &lt;code&gt;weather_producer.py&lt;/code&gt; script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Producer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;

&lt;span class="c1"&gt;# Configure logging
&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%(asctime)s - %(name)s - %(levelname)s - %(message)s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from .env file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# List of cities
&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Nairobi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Johannesburg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Casablanca&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lagos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kinshasa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# OpenWeatherMap API setup
&lt;/span&gt;&lt;span class="n"&gt;owm_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;WEATHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;owm_base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openweathermap.org/data/2.5/weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch weather data from OpenWeatherMap API using city name.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;owm_base_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;?q=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;appid=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;owm_api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;units=metric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extracted_city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequestException&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error fetching data for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delivery_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Callback for Kafka message delivery status.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message delivery failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message delivered to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] at offset &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Kafka configuration
&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BOOTSTRAP_SERVERS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security.protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SASL_SSL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.mechanisms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KAFKA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KAFKA_API_SECRET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;broker.address.family&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message.send.max.retries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retry.backoff.ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;produce_weather_data&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch weather data for each city and produce to Kafka.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;delivery_report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to fetch data for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;produce_weather_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data extraction and production complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Notable Components of the Producer Script:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Environment Setup&lt;/strong&gt;: We import &lt;code&gt;dotenv&lt;/code&gt; to load environment variables and initialize logging to track script execution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;City List&lt;/strong&gt;: We define a list of African cities for which we want to collect weather data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API Integration&lt;/strong&gt;: The function &lt;code&gt;fetch_weather_data()&lt;/code&gt; makes HTTP requests to the OpenWeatherMap API, asking for metric units of measurement for temperature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kafka Configuration&lt;/strong&gt;: We set up a Kafka producer with security credentials and reliability properties.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Production&lt;/strong&gt;: The &lt;code&gt;produce_weather_data()&lt;/code&gt; method reads data for each city and produces it to the "weather_data" Kafka topic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Delivery Reporting&lt;/strong&gt;: The &lt;code&gt;delivery_report()&lt;/code&gt; callback method prints if messages were delivered to Kafka successfully.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Creating the Consumer Script
&lt;/h2&gt;

&lt;p&gt;Now, let's create the consumer script that will read data from the Kafka topic and store it in Cassandra. Here is our &lt;code&gt;weather_consumer.py&lt;/code&gt; script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KafkaError&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cassandra.cluster&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cluster&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;

&lt;span class="c1"&gt;# Configure logging
&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%(asctime)s - %(name)s - %(levelname)s - %(message)s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Kafka configuration
&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BOOTSTRAP_SERVERS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_consumer_group&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto.offset.reset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;earliest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security.protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SASL_SSL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.mechanisms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KAFKA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sasl.password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KAFKA_API_SECRET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;broker.address.family&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Cassandra configuration
&lt;/span&gt;&lt;span class="n"&gt;cassandra_host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CASSANDRA_HOST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Connect to Cassandra
&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cluster&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;cassandra_host&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize_cassandra&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Initialize Cassandra connection and create keyspace/table if needed.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Create keyspace if it doesn't exist
&lt;/span&gt;        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            CREATE KEYSPACE IF NOT EXISTS weather_data
            WITH replication = {&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SimpleStrategy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replication_factor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;}
            AND durable_writes = true;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Use the keyspace
&lt;/span&gt;        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE weather_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create table if it doesn't exist
&lt;/span&gt;        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            CREATE TABLE IF NOT EXISTS simple_weather (
                id uuid PRIMARY KEY,
                city_name text,
                temperature float,
                timestamp timestamp,
                weather_description text,
                weather_main text
            );
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cassandra table ready&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cassandra initialization error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;insert_weather_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Insert weather data into Cassandra.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract interesting fields
&lt;/span&gt;        &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extracted_city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;temp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;weather_desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;weather_main&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Insert data into Cassandra
&lt;/span&gt;        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            INSERT INTO simple_weather (id, city_name, temperature, timestamp, weather_description, weather_main)
            VALUES (%s, %s, %s, toTimestamp(now()), %s, %s)
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weather_desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weather_main&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Inserted weather for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error inserting data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;consume_weather_data&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Consume weather data from Kafka and store in Cassandra.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Initialize Cassandra
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;initialize_cassandra&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# Create Kafka consumer
&lt;/span&gt;    &lt;span class="n"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subscribed to topic: weather_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;KafkaError&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_PARTITION_EOF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Consumer error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
                &lt;span class="nf"&gt;insert_weather_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error processing message: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;KeyboardInterrupt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stopping consumer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shutdown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;consume_weather_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Components of the Consumer Script:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kafka Consumer Configuration&lt;/strong&gt;: We set up a consumer to consume from the "weather_data" topic, with the appropriate security settings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cassandra Connection&lt;/strong&gt;: We connect to the Cassandra cluster directly without authentication credentials.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Database Initialization&lt;/strong&gt;: The &lt;code&gt;initialize_cassandra()&lt;/code&gt; function creates the necessary keyspace and table if they haven't already been created.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Processing&lt;/strong&gt;: The &lt;code&gt;insert_weather_data()&lt;/code&gt; function extracts pertinent fields from the weather data and inserts them into Cassandra.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Continuous Consumption&lt;/strong&gt;: The &lt;code&gt;consume_weather_data()&lt;/code&gt; function continuously polls the Kafka topic and processes any messages it finds.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Running the Pipeline on Azure
&lt;/h2&gt;

&lt;p&gt;Now that we've written both the producer and consumer scripts, let's run them on our Azure VM to see our data pipeline in action.&lt;/p&gt;

&lt;p&gt;First, make sure your virtual environment is activated and all dependencies are installed. Then, run the producer script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 weather_producer.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As shown in the first screenshot, the producer is successfully retrieving weather data and producing it to Kafka:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-04-07 05:56:47,013 - __main__ - INFO - Message delivered to weather_data [4] at offset 8
2025-04-07 05:56:47,013 - __main__ - INFO - Message delivered to weather_data [4] at offset 9
2025-04-07 05:56:47,035 - __main__ - INFO - Message delivered to weather_data [3] at offset 8
2025-04-07 05:56:47,035 - __main__ - INFO - Message delivered to weather_data [3] at offset 9
2025-04-07 05:56:47,072 - __main__ - INFO - Message delivered to weather_data [2] at offset 4
2025-04-07 05:56:47,072 - __main__ - INFO - Data extraction and production complete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, run the consumer script in a separate terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 weather_consumer.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The consumer subscribes to the Kafka topic and begins processing messages, as shown the second screenshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Subscribed to topic: weather_data
Cassandra table ready
Inserted weather for Johannesburg at 2025-04-07 05:56:45
Inserted weather for Lagos at 2025-04-07 05:55:29
Inserted weather for Nairobi at 2025-04-07 05:56:45
Inserted weather for Casablanca at 2025-04-07 05:53:05
Inserted weather for Kinshasa at 2025-04-07 05:56:46
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Querying the Data
&lt;/h2&gt;

&lt;p&gt;Now that the data is in Cassandra, we can query it using CQL. On your Azure VM, use the &lt;code&gt;cqlsh&lt;/code&gt; tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cqlsh localhost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query to retrieve the weather data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;USE&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;simple_weather&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As in your third screenshot, this pulls back all of the weather data for our cities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;id                                   | city_name     | temperature | timestamp                       | weather_description | weather_main
--------------------------------------+--------------+-------------+--------------------------------+---------------------+-------------
9781d8f2-e7d1-484f-a780-4df61ed7c7da | Johannesburg |       15.02 | 2025-04-07 06:22:41.000000+0000 |    overcast clouds  |      Clouds
f64ab11f-a732-4d47-84fb-f450d1e4a9bc |     Kinshasa |       21.21 | 2025-04-07 06:22:41.000000+0000 |         few clouds  |      Clouds
01840d74-0b10-461b-98b4-8eaf5fce2e0c |      Nairobi |       18.62 | 2025-04-07 06:22:41.000000+0000 |     broken clouds   |      Clouds
546d4df6-45b4-4956-94b1-b92b92bc1e84 |        Lagos |       26.57 | 2025-04-07 06:22:41.000000+0000 |    overcast clouds  |      Clouds
95d0dc97-a940-45a9-ab5d-28b9deba7b02 |   Casablanca |       14.07 | 2025-04-07 06:20:30.000000+0000 |   scattered clouds  |      Clouds
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Extending the Pipeline
&lt;/h2&gt;

&lt;p&gt;This basic pipeline can be extended as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add More Cities&lt;/strong&gt;: Expand the list of cities to get a broader geographic coverage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Collect More Data Points&lt;/strong&gt;: Modify the scripts to collect additional weather parameters like humidity, wind speed, and barometric pressure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Aggregation&lt;/strong&gt;: Add functionality to calculate averages, minimums, and maximums over time intervals of varying sizes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visualizations&lt;/strong&gt;: Connect a visualization software like Grafana or Azure Data Explorer to your Cassandra database to create dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Alerts&lt;/strong&gt;: Configure alerts for extreme weather conditions by adding thresholds for temperature, rainfall, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scale with Azure Kubernetes Service&lt;/strong&gt;: For larger deployments, consider containerizing your applications and deploying them on AKS.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Common Issues and Troubleshooting
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API Rate Limiting&lt;/strong&gt;: The OpenWeatherMap API rate limits free tier accounts. If you're dealing with a large amount of cities, you might hit these limits. Consider using a paid tier or request throttling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kafka Connection Issues&lt;/strong&gt;: Ensure that your Kafka credentials and bootstrap servers are correctly configured in the &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cassandra Connectivity&lt;/strong&gt;: Ensure that your Cassandra instance is accessible on your Azure VM and that the firewall rules allow connections.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Azure VM Connectivity&lt;/strong&gt;: Check that your Network Security Group allows the necessary inbound and outbound traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Disk Space&lt;/strong&gt;: Monitor your Azure VM's disk space, especially if storing large amounts of historical weather data.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kafka.apache.org/documentation/" rel="noopener noreferrer"&gt;Apache Kafka Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cassandra.apache.org/doc/latest/" rel="noopener noreferrer"&gt;Apache Cassandra Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html" rel="noopener noreferrer"&gt;Confluent Kafka Python Client&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/cloud/current/overview.html" rel="noopener noreferrer"&gt;Confluent Cloud Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openweathermap.org/api" rel="noopener noreferrer"&gt;OpenWeatherMap API Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dataengineering</category>
      <category>data</category>
      <category>apachekafka</category>
    </item>
    <item>
      <title>Building an Automated Bitcoin Price ETL Pipeline with Airflow and PostgreSQL</title>
      <dc:creator>Eric Katumo</dc:creator>
      <pubDate>Mon, 31 Mar 2025 19:07:56 +0000</pubDate>
      <link>https://dev.to/luxdevhq/building-an-automated-bitcoin-price-etl-pipeline-with-airflow-and-postgresql-1ok7</link>
      <guid>https://dev.to/luxdevhq/building-an-automated-bitcoin-price-etl-pipeline-with-airflow-and-postgresql-1ok7</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This article details creating an automated ETL (Extract, Transform, Load) pipeline that retrieves daily Bitcoin price data from the Polygon.io API, performs necessary transformations, and loads the data into a PostgreSQL database. The workflow is orchestrated using Apache Airflow, ensuring reliable daily execution.&lt;/p&gt;

&lt;p&gt;This project demonstrates several key data engineering concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API data extraction&lt;/li&gt;
&lt;li&gt;Data transformation using pandas&lt;/li&gt;
&lt;li&gt;Database integration with PostgreSQL&lt;/li&gt;
&lt;li&gt;Workflow orchestration with Apache Airflow&lt;/li&gt;
&lt;li&gt;Deployment to a cloud environment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;The pipeline consists of the following components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Source&lt;/strong&gt;: Polygon.io API providing cryptocurrency price data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ETL Script&lt;/strong&gt;: Python script that handles extraction, transformation, and loading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt;: PostgreSQL for data storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration&lt;/strong&gt;: Apache Airflow for scheduling and monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Cloud VM for hosting the pipeline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The system flows in a linear fashion: Airflow triggers the ETL script daily, which extracts the latest BTC prices, transforms the data into a suitable format, and loads it into the PostgreSQL database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Detailed Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Creating the ETL Script
&lt;/h3&gt;

&lt;p&gt;The first component is &lt;code&gt;btc_prices.py&lt;/code&gt;, which handles the core ETL functionality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="c1"&gt;# Define API endpoint
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://api.polygon.io/v1/open-close/crypto/BTC/USD/2025-03-31?adjusted=true&amp;amp;apiKey=YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;open_price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;close_price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;close&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;symbol&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to retrieve data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Prepare data for insertion
&lt;/span&gt;&lt;span class="n"&gt;data_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;symbol&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;open_price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;open_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;close_price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;close_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;dbname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbname&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create database connection
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postgresql://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dbname&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crypto_prices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataengineering&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully loaded crypto data for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extracts Bitcoin price data from the Polygon.io API&lt;/li&gt;
&lt;li&gt;Transforms and structures the data using pandas&lt;/li&gt;
&lt;li&gt;Loads the data into PostgreSQL&lt;/li&gt;
&lt;li&gt;Uses environment variables for secure database connection management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Creating the Airflow DAG
&lt;/h3&gt;

&lt;p&gt;Next, the &lt;code&gt;btc_dag.py&lt;/code&gt; defines the Airflow DAG (Directed Acyclic Graph) that orchestrates the workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.bash_operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BashOperator&lt;/span&gt;

&lt;span class="c1"&gt;# DAG default arguments
&lt;/span&gt;&lt;span class="n"&gt;default_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;owner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_engineer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;depends_on_past&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_on_failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_on_retry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retry_delay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;polygon_btc_data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;default_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;@daily&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;activate_venv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;activate_virtual_env&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;source /home/user/project/venv/bin/activate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;execute_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;execute_python_file&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;python /home/user/project/btc_prices.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;activate_venv&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;execute_file&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This DAG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Defines the execution schedule&lt;/li&gt;
&lt;li&gt;Activates the virtual environment&lt;/li&gt;
&lt;li&gt;Executes the ETL script&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Setting Up the Environment
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Creating a Virtual Environment&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
   &lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Installing Dependencies&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   pip &lt;span class="nb"&gt;install &lt;/span&gt;requests pandas sqlalchemy python-dotenv psycopg2-binary apache-airflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Setting Up Environment Variables&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"dbname=your_database_name"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"user=your_database_user"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"password=your_database_password"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"host=your_database_host"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"port=your_database_port"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Server Deployment
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SSH into the cloud VM&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   ssh user@your_server_ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create necessary directories&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/crypto_price
   &lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/airflow/dags
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Transfer scripts to the server&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   scp btc_prices.py user@your_server_ip:~/crypto_price/
   scp btc_dag.py user@your_server_ip:~/airflow/dags/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: PostgreSQL Configuration
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Creating Database Schema&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;dataengineering&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;dataengineering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crypto_prices&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;open_price&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;close_price&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The architecture follows best practices for data engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separation of extraction, transformation, and loading concerns&lt;/li&gt;
&lt;li&gt;Secure credential management&lt;/li&gt;
&lt;li&gt;Robust error handling&lt;/li&gt;
&lt;li&gt;Automated scheduling&lt;/li&gt;
&lt;li&gt;Cloud-based deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of Python, Airflow, and PostgreSQL provides a powerful foundation for financial data analysis, enabling timely insights into cryptocurrency market trends.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Katumo-Zzetu/BTC-Pipeline.git" rel="noopener noreferrer"&gt;Github &lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>data</category>
      <category>apacheairflow</category>
    </item>
    <item>
      <title>Amazon Redshift Data Warehousing (Sample Project)</title>
      <dc:creator>Eric Katumo</dc:creator>
      <pubDate>Wed, 26 Mar 2025 15:32:33 +0000</pubDate>
      <link>https://dev.to/luxdevhq/amazon-redshift-data-warehousing-sample-project-3h86</link>
      <guid>https://dev.to/luxdevhq/amazon-redshift-data-warehousing-sample-project-3h86</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;This project focuses on analyzing two key business processes using Amazon Redshift:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identifying the most common &lt;strong&gt;frequency of purchases&lt;/strong&gt; per season.&lt;/li&gt;
&lt;li&gt;Finding &lt;strong&gt;products with high review ratings&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1: Setting Up Amazon Redshift
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create a Redshift Cluster&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to AWS Redshift Console.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create Cluster&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Choose a &lt;strong&gt;single-node&lt;/strong&gt; cluster.&lt;/li&gt;
&lt;li&gt;Select an appropriate &lt;strong&gt;node type&lt;/strong&gt; (e.g., &lt;code&gt;dc2.large&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Set the &lt;strong&gt;database name, username, and password&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create cluster&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create an IAM Role&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to the &lt;strong&gt;IAM Console&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Create a new role with &lt;strong&gt;AmazonS3ReadOnlyAccess&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Attach the role to your Redshift cluster.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create an S3 Bucket and Upload Data&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Amazon S3&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Create a new bucket (e.g., &lt;code&gt;zzetu&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Upload &lt;code&gt;shopping_data.csv&lt;/code&gt; to this bucket.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 2: Loading Data into Redshift
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use the Amazon Redshift Query Editor&lt;/strong&gt; to run SQL commands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create a table to store raw data&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Age&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Gender&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Category&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;Location&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Season&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ReviewRating&lt;/span&gt; &lt;span class="nb"&gt;REAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SubscriptionStatus&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;PaymentMethod&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ShippingType&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;DiscountApplied&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;PromoCodeUsed&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;PreviousPurchases&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PreferredPaymentMethod&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;FrequencyOfPurchases&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Copy data from S3 into Redshift&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'s3://zzetu/shopping_data.csv'&lt;/span&gt;
&lt;span class="n"&gt;IAM_ROLE&lt;/span&gt; &lt;span class="s1"&gt;'arn:aws:iam::your-account-id:role/your-redshift-role'&lt;/span&gt;
&lt;span class="n"&gt;FORMAT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;
&lt;span class="n"&gt;IGNOREHEADER&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Schema Design
&lt;/h2&gt;

&lt;p&gt;We are using a star schema with 1 fact table and 3 dimension tables:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Dimension Tables&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dimCustomer&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Age&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Gender&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;Location&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;SubscriptionStatus&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;PreferredPaymentMethod&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dimProduct&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Category&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DiscountApplied&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;PromoCodeUsed&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dimTransaction&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;TransactionID&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PaymentMethod&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ShippingType&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Season&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;FrequencyOfPurchases&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;dimCustomer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2. Fact Table&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;factPurchases&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Age&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ReviewRating&lt;/span&gt; &lt;span class="nb"&gt;REAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PreviousPurchases&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;dimCustomer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Data Insertion
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Insert data into dimCustomer&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;dimCustomer&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gender&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SubscriptionStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PreferredPaymentMethod&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gender&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SubscriptionStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PreferredPaymentMethod&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert data into dimProduct&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;dimProduct&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DiscountApplied&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PromoCodeUsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;Category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DiscountApplied&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PromoCodeUsed&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert data into dimTransaction&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;dimTransaction&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PaymentMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ShippingType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FrequencyOfPurchases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PaymentMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ShippingType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FrequencyOfPurchases&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert data into factPurchases&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;factPurchases&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReviewRating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PreviousPurchases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReviewRating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PreviousPurchases&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Business Queries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Most Used Frequency of Purchase per Season&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FrequencyOfPurchases&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;PurchaseCount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;factPurchases&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;dimTransaction&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FrequencyOfPurchases&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PurchaseCount&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2. Products with High Review Ratings&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;Category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReviewRating&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;AvgRating&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_Purchases&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;shopping&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CustomerID&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Category&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;AvgRating&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setup Guide
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create Redshift Cluster&lt;/strong&gt; and configure IAM roles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load Data from S3&lt;/strong&gt; into Redshift tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run the SQL scripts&lt;/strong&gt; for creating tables and inserting data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute business queries&lt;/strong&gt; to generate insights.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Repository Link
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Katumo-Zzetu/Redshift-DE.git" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>data</category>
    </item>
    <item>
      <title>Understanding MCP and How AI Engineers Can Leverage It</title>
      <dc:creator>Mwenda Harun Mbaabu</dc:creator>
      <pubDate>Sun, 16 Mar 2025 09:48:50 +0000</pubDate>
      <link>https://dev.to/luxdevhq/understanding-mcp-and-how-ai-engineers-can-leverage-it-3e2i</link>
      <guid>https://dev.to/luxdevhq/understanding-mcp-and-how-ai-engineers-can-leverage-it-3e2i</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Understanding MCP (Model Context Protocol Server) and How AI Engineers Can Leverage It&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In the rapidly evolving landscape of AI development, &lt;strong&gt;Model Context Protocol (MCP) Servers&lt;/strong&gt; have emerged as a game-changer. These servers facilitate efficient communication between AI models and applications, ensuring that contextual data is preserved and utilized effectively. For AI engineers working with large models like Mixtral, MCP servers provide the backbone for deploying scalable, intelligent agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What is MCP (Model Context Protocol Server)?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) is a framework designed to handle &lt;strong&gt;context-aware AI interactions&lt;/strong&gt; by maintaining and efficiently managing session data. Traditional AI models often suffer from context loss, where a model loses track of previous interactions. MCP servers solve this by maintaining contextual state, enabling more coherent and intelligent responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Key Features of MCP Servers:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session Persistence:&lt;/strong&gt; Stores past interactions for improved model continuity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability:&lt;/strong&gt; Handles multiple AI model instances efficiently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Context Management:&lt;/strong&gt; Adapts to evolving conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Coordination:&lt;/strong&gt; Supports interactions between multiple AI agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security &amp;amp; Access Control:&lt;/strong&gt; Ensures safe AI model communication.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Best MCP Servers for AI Engineers&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI engineers can leverage various MCP servers to enhance their AI models' performance. Below are some of the top MCP implementations:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;LangChain Server&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open-source framework for contextual AI interactions.&lt;/li&gt;
&lt;li&gt;Provides tools for managing long conversations and memory.&lt;/li&gt;
&lt;li&gt;Works well with OpenAI, Mixtral, and other LLMs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;FastAPI with Redis (Custom MCP Implementation)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Leverages &lt;strong&gt;FastAPI&lt;/strong&gt; for fast API interactions.&lt;/li&gt;
&lt;li&gt;Uses &lt;strong&gt;Redis&lt;/strong&gt; for session persistence.&lt;/li&gt;
&lt;li&gt;Ideal for real-time applications needing scalable AI context management.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Haystack MCP&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Built for &lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Optimized for &lt;strong&gt;knowledge-driven AI models&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Works well for applications like chatbots and enterprise search.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Ollama MCP&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lightweight MCP server for local AI models.&lt;/li&gt;
&lt;li&gt;Focused on privacy-preserving AI deployments.&lt;/li&gt;
&lt;li&gt;Best suited for &lt;strong&gt;on-premises AI applications&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Use Cases of MCP in AI Applications&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;MCP servers enable a range of powerful AI-driven solutions. Here are some critical use cases:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. AI-Powered Virtual Assistants&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ensures conversation continuity.&lt;/li&gt;
&lt;li&gt;Reduces redundant queries and improves personalization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Customer Support Chatbots&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Maintains context across sessions.&lt;/li&gt;
&lt;li&gt;Enhances chatbot accuracy and user satisfaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. AI-Powered Recommendation Systems&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Uses historical interactions for better recommendations.&lt;/li&gt;
&lt;li&gt;Deployed in e-commerce, entertainment, and healthcare.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Financial AI Agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enhances fraud detection using contextual transaction analysis.&lt;/li&gt;
&lt;li&gt;Improves financial forecasting through data-driven insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. AI in Healthcare&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Provides better patient interactions by remembering medical history.&lt;/li&gt;
&lt;li&gt;Supports AI-driven diagnostics and decision-making.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Roadmap for Building an AI Agent with Python and Mixtral Large&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Building an AI agent using &lt;strong&gt;Python&lt;/strong&gt; and &lt;strong&gt;Mixtral Large&lt;/strong&gt; requires a structured approach. Below is a roadmap:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 1: Environment Setup&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Install Python and necessary dependencies:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi redis mixtral openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Set up Redis for session persistence.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 2: Build the AI Agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create an &lt;strong&gt;API backend&lt;/strong&gt; using FastAPI:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
   &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;

   &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decode_responses&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Integrate &lt;strong&gt;Mixtral Large&lt;/strong&gt; for LLM responses:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

   &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;AI:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mixtral-large&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 3: Deploy the MCP Server&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Run the FastAPI MCP server:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   uvicorn app:app &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Set up a &lt;strong&gt;Redis-backed session manager&lt;/strong&gt; to maintain context.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 4: Integrate and Scale&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Deploy the solution using &lt;strong&gt;Docker&lt;/strong&gt; and &lt;strong&gt;Kubernetes&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Implement &lt;strong&gt;WebSockets&lt;/strong&gt; for real-time interaction.&lt;/li&gt;
&lt;li&gt;Optimize &lt;strong&gt;Mixtral model calls&lt;/strong&gt; for cost efficiency.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Final Thoughts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol (MCP) Servers are &lt;strong&gt;indispensable&lt;/strong&gt; in modern AI applications. Whether you're building virtual assistants, chatbots, or intelligent recommendation systems, MCP ensures &lt;strong&gt;context retention&lt;/strong&gt;, making AI more &lt;strong&gt;responsive and intelligent&lt;/strong&gt;. By leveraging Python and Mixtral Large, AI engineers can build scalable, stateful AI agents that drive real-world impact.&lt;/p&gt;




&lt;h3&gt;
  
  
  🚀 &lt;strong&gt;Next Steps:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Explore different &lt;strong&gt;MCP frameworks&lt;/strong&gt; to find the best fit.&lt;/li&gt;
&lt;li&gt;Optimize AI model deployments using &lt;strong&gt;Redis-backed session storage&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Implement &lt;strong&gt;multi-agent systems&lt;/strong&gt; using MCP servers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready to build the next-gen AI agent? Start coding today!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The role of SQL in Data Analytics: Why Every Analyst Should Learn SQL</title>
      <dc:creator>Dennis-Kirimi</dc:creator>
      <pubDate>Tue, 11 Mar 2025 14:20:10 +0000</pubDate>
      <link>https://dev.to/luxdevhq/the-role-of-sql-in-data-analytics-why-every-analyst-should-learn-sql-kmi</link>
      <guid>https://dev.to/luxdevhq/the-role-of-sql-in-data-analytics-why-every-analyst-should-learn-sql-kmi</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where is this coming from?
&lt;/h3&gt;

&lt;p&gt;Data has become the main factor any time a person or organization is making decisions or insights. This calls for an analyst who is able to manipulate, transform and extract insights from this data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmlmnyyvbqpr6d5tmfs3.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmlmnyyvbqpr6d5tmfs3.jpeg" alt="Image description" width="626" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Any time an analyst is interacting with data, there are a number of tools and languages available to complete the tasks. SQL is one of them!&lt;br&gt;
SQL, Structured Query Language, is a standard language used for managing and manipulating relational databases. This is a must learn language whether you're a beginner or a professional in the data field.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq58bef4ifx029w9z1nwq.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq58bef4ifx029w9z1nwq.jpeg" alt="Image description" width="736" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The role of SQL in Data Analytics
&lt;/h2&gt;

&lt;p&gt;We've just mentioned that SQL is used for interacting with relational databases, right? SQL remains to be the unsung hero of data analysis.&lt;br&gt;
Data is stored in databases. A database is an organized collection of data stored electronically. The data could be in form of tables with rows and columns. But how do we interact with the data in the databases?&lt;br&gt;
SQL helps analysts to retrieve, clean and manipulate data with ease.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdisg5ylwui4vyl015iov.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdisg5ylwui4vyl015iov.jpeg" alt="Image description" width="731" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  But Why Should Every Data Analyst Learn SQL?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. SQL helps an analyst get the data you need from a database for your analysis.
&lt;/h3&gt;

&lt;p&gt;Can you image of a company that is over 20 years old? Yes, majority of this business data is stored in relational databases such as MySQL, PostgreSQL, and SQL Server. SQL helps you, as an analyst, pull the required data, it could be last year's sales data and perform further analysis without having to retrieve the others. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9nq6cj0vp4wsnqkaaqx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9nq6cj0vp4wsnqkaaqx3.png" alt="Image description" width="800" height="210"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. SQL is built for interacting with large datasets.
&lt;/h3&gt;

&lt;p&gt;Excel might work for small reports. But what if you're dealing with millions of rows of data? That's where SQL comes in again, hurray!&lt;br&gt;
SQL makes the interaction with large datasets easy and pardons the computer from crashing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl510u7xerw9j699i2x8h.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl510u7xerw9j699i2x8h.jpeg" alt="Image description" width="736" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Data Cleaning
&lt;/h3&gt;

&lt;p&gt;Most of the times, data comes in a dirty format which includes missing values, duplicates among others. SQL gives you a fair room to use in-built functions to clean the data and prepare it for visualization. This could mean removing duplicates or replacing missing values with the most appropriate values depending on the analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpu1ngr15kgah8r8xz10a.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpu1ngr15kgah8r8xz10a.jpeg" alt="Image description" width="735" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Merging different tables
&lt;/h3&gt;

&lt;p&gt;In databases, data is stored in schemas and tables holding different data and the data in those tables could be related.&lt;br&gt;
With a simple query, one can join the tables and create a view to perform further analysis. This can hardly be achieved if using tools like Excel.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. SQL integrates with a number of popular visualization tools
&lt;/h3&gt;

&lt;p&gt;After you're done interacting with the data from a database, the next step is always visualizing or creating reports.&lt;br&gt;
SQL gives a direct pipeline to connect with those visualization tools like Power BI or Tableau.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuq3jt9fx9g072vxoj3n.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuq3jt9fx9g072vxoj3n.jpeg" alt="Image description" width="736" height="736"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. SQL ensures collaboration across teams.
&lt;/h3&gt;

&lt;p&gt;In most organizations, SQL is a central and common language in the data department. Data analysts, engineers and scientists often communicate through data pipelines and majorly databases. Databases and no SQL?? Doesn't make sense!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e0adakjpl7a5kuyz52f.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e0adakjpl7a5kuyz52f.jpeg" alt="Image description" width="736" height="736"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SQL is more than just a language. It's a critical domain sensitive skill that every analyst should learn whenever interacting with databases.&lt;/p&gt;

&lt;p&gt;If you haven't started learning SQL yet, now is the perfect time!!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Ultimate Guide to Apache Kafka: Basics, Architecture, and Core Concepts</title>
      <dc:creator>Eric Katumo</dc:creator>
      <pubDate>Mon, 10 Mar 2025 12:16:48 +0000</pubDate>
      <link>https://dev.to/luxdevhq/the-ultimate-guide-to-apache-kafka-31ce</link>
      <guid>https://dev.to/luxdevhq/the-ultimate-guide-to-apache-kafka-31ce</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60pq9ixqqlfqpagvtc0j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60pq9ixqqlfqpagvtc0j.png" alt="Image description" width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Apache Kafka is a widely-used open-source platform for distributed event streaming, supporting high-performance data pipelines, streaming analytics, data integration, and mission-critical applications across thousands of companies &lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;https://kafka.apache.org/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Originally developed by LinkedIn, Kafka is renowned for its high throughput, scalability and durability. It enables real-time data processing and is a key component in modern event-driven architectures.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Kafka Architecture&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Brokers&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;A single Kafka server is called a Kafka Broker. Each broker operates as an independent process on a distinct machine, communicating with other brokers via a reliable and high-speed network.&lt;/p&gt;

&lt;p&gt;There can be any number of brokers but 3 are typically shown as a minimum. This allows one out for maintenance and one to fail at the same time.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Producers&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;A Producer is a client application that publishes (writes) events to a Kafka cluster. The Kafka producer is responsible for creating messages of the appropriate structure and sending them using the Kafka protocol. It has several configuration options to control message creation and delivery.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Consumers&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The Kafka consumer works by issuing “fetch” requests to the brokers leading the partitions it wants to consume. The consumer receives back a chunk of log that contains all of the messages in that topic beginning from the offset position.&lt;/p&gt;

&lt;p&gt;A consumer group is a set of consumers that cooperate to consume data from some topics&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Topic&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Kafka topics are the categories used to organize messages. Topics are handled by Kafka as independent queues. This means that a consumer can subscribe to a specific topic and only receive messages marked with that topic.&lt;/p&gt;

&lt;p&gt;In Kafka, topics are partitioned and replicated across brokers throughout the implementation. Brokers refer to each of the nodes in a Kafka cluster&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Cluster&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Kafka clusters are a group of interconnected Kafka brokers that work together to manage the data streams entering and leaving a Kafka system. As user activity increases, so does the need for additional Kafka brokers to cope with the volume and velocity of the incoming data streams.&lt;/p&gt;

&lt;p&gt;Kafka clusters enable the replication of data partitions across multiple brokers, ensuring high availability even in the case of node failures. Your data pipeline remains robust and responsive to fluctuating demand.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Partitions&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Partitions are essential components within Kafka's distributed architecture that enable Kafka to scale horizontally, allowing for efficient parallel data processing. They are the building blocks for organizing and distributing data across the Kafka cluster.&lt;/p&gt;

&lt;p&gt;Each partition can have multiple replicas spread across different brokers, guaranteeing fault tolerance and data redundancy.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;KRaft&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The consensus protocol that was introduced in KIP-500 to remove Apache Kafka’s dependency on ZooKeeper for metadata management. It leverages the Raft consensus algorithm to manage metadata and handle leader election natively within Kafka.&lt;/p&gt;

&lt;p&gt;This eliminates the dependency on an external coordination system, allowing Kafka to function as a self-contained system.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Core Concepts&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Events&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;A digital logbook that keeps track of everything important happening in your system. This logbook is filled with "events" – like a record of each significant action or change. In Kafka, these events are the core of how data is stored and shared.&lt;/p&gt;

&lt;p&gt;Think of an event as a single entry in this logbook, with a few key pieces of information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Key:&lt;/strong&gt; A unique identifier for the event. This helps you categorize or group related events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value:&lt;/strong&gt; The actual details of what happened.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timestamp:&lt;/strong&gt; When the event occurred.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headers (optional):&lt;/strong&gt; Extra bits of information about the event.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Replication&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Replication is the process of having multiple copies of the data for the sole purpose of availability in case one of the brokers goes down and is unavailable to serve the requests. Copies of the partition are maintained at multiple broker instances using the partition’s write-ahead log.&lt;/p&gt;

&lt;p&gt;The write-ahead log is where all the messages for that partition are stored in order. The messages are identified by the unique offset.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Offsets&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The consumer offset is a way of tracking the sequential order in which messages are received by Kafka topics.  Keeping track of the offset, or position, is important for nearly all Kafka use cases and can be an absolute necessity in certain instances, such as financial services.&lt;/p&gt;

&lt;p&gt;The Kafka consumer offset allows processing to continue from where it last left off if the stream application is turned off or if there is an unexpected failure. In other words, by having the offsets persist in a data store, data continuity is retained even when the stream application shuts down or fails.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Consumer Groups&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Consumer groups allow Kafka consumers to work together and process events from a topic in parallel. Each topic consists of one or more partitions. When a new consumer is started it will join a consumer group (this happens under the hood) and Kafka will then ensure that each partition is consumed by only one consumer from that group.&lt;/p&gt;

&lt;p&gt;So, if you have a topic with two partitions and only one consumer in a group, that consumer would consume records from both partitions. After another consumer joins the same group, each consumer would continue consuming only one partition.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Retention&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Kafka retention provides the ability to control the size of the Topic logs and avoid outgrowing the existing disk size. The retention can be configured or controlled based on the size of the logs (log retention) or based on the configured duration (Time Based Retention).&lt;/p&gt;

&lt;p&gt;Also, the same retention can be set across all the Kafka topics or it can be configured per topic, depending on the nature of the topic we can set the retention accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Apache Kafka is a powerful distributed streaming platform that combines messaging and storage capabilities. Its architecture, featuring brokers, topics, and partitions, delivers scalability and fault tolerance. Kafka's core concepts, such as consumer groups and offsets, enable efficient and reliable stream processing.&lt;/p&gt;

&lt;p&gt;With its ability to handle high-volume data streams and support real-time applications, Kafka is a crucial component of modern data architectures. It empowers developers to build robust, data-driven applications that address the challenges of today's data-intensive world. To get started with Apache Kafka visit &lt;a href="https://kafka.apache.org/quickstart" rel="noopener noreferrer"&gt;https://kafka.apache.org/quickstart&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>datascience</category>
      <category>data</category>
    </item>
    <item>
      <title>Introducing LuxDevHQ Data Science, Artificial Intelligence, and Analytics Prep Program</title>
      <dc:creator>Mwenda Harun Mbaabu</dc:creator>
      <pubDate>Sun, 02 Mar 2025 19:28:44 +0000</pubDate>
      <link>https://dev.to/luxdevhq/introducing-luxdevhq-data-science-artificial-intelligence-and-analytics-prep-program-43fe</link>
      <guid>https://dev.to/luxdevhq/introducing-luxdevhq-data-science-artificial-intelligence-and-analytics-prep-program-43fe</guid>
      <description>&lt;p&gt;&lt;strong&gt;LuxDevHQ&lt;/strong&gt; is excited to introduce an intensive 6 weeks Data Science, Artificial Intelligence, and Analytics Prep Program designed to equip learners with essential technical skills and real-world project experience.&lt;/p&gt;

&lt;p&gt;This rigorous and relentless training is designed to push participants to their limits, ensuring they gain hands-on experience in key data technologies. Unlike other &lt;strong&gt;online&lt;/strong&gt; or &lt;strong&gt;hybrid programs&lt;/strong&gt;, this fully &lt;strong&gt;onsite training&lt;/strong&gt; will require learners to attend &lt;strong&gt;physical classes daily&lt;/strong&gt; from &lt;strong&gt;9:00 AM&lt;/strong&gt; to &lt;strong&gt;4:00 PM EAT&lt;/strong&gt; at LuxDevHQ’s Kilimani campus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;em&gt;Registration for this program ends on 31st March 2025 and training begins 21st of April 2025.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Program Fees&lt;/strong&gt;&lt;br&gt;
The total cost for the 6-week intensive program is 10,000 KES. This fee covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full access to all training materials and resources&lt;/li&gt;
&lt;li&gt;Hands-on mentorship from industry professionals&lt;/li&gt;
&lt;li&gt;Participation in real-world projects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;It is important to note that all out programs include a personalized skills assessment test, which is a 30 minutes virtual or in-person call with one of our trainers. This assessment helps determine your skill level to ensure you are placed in a program where you will benefit the most as a participant or learner. The test costs 500 KES, which is refundable if the applicant does not pass the entrance assessment.&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The program is divided into two key phases:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Core Training (Weeks 1-2)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first two weeks focus on developing proficiency in the most essential data tools:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Microsoft Excel&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data organization, manipulation, and cleaning techniques&lt;/li&gt;
&lt;li&gt;Advanced formulas and functions (VLOOKUP, INDEX/MATCH, Pivot Tables)&lt;/li&gt;
&lt;li&gt;Data visualization using charts and graphs&lt;/li&gt;
&lt;li&gt;Macros and automation for efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SQL&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduction to relational databases and SQL syntax&lt;/li&gt;
&lt;li&gt;Writing queries to extract, filter, and aggregate data&lt;/li&gt;
&lt;li&gt;Advanced SQL functions: Joins, Subqueries, CTEs, and Window functions&lt;/li&gt;
&lt;li&gt;Database optimization and indexing for performance tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Python for Data Science &amp;amp; Analytics&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python programming fundamentals (variables, loops, functions, OOP)&lt;/li&gt;
&lt;li&gt;Data analysis with Pandas (data wrangling, transformations, merging datasets)&lt;/li&gt;
&lt;li&gt;Data visualization using Matplotlib and Seaborn&lt;/li&gt;
&lt;li&gt;Numerical computations using NumPy&lt;/li&gt;
&lt;li&gt;Introduction to Machine Learning using Scikit-Learn&lt;/li&gt;
&lt;li&gt;Automated data workflows with Apache Airflow&lt;/li&gt;
&lt;li&gt;Implementing supervised and unsupervised machine learning models&lt;/li&gt;
&lt;li&gt;Using ML libraries such as TensorFlow, PyTorch, and XGBoost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Power BI for Business Intelligence&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data connection and transformation using Power Query&lt;/li&gt;
&lt;li&gt;Data modeling and DAX (Data Analysis Expressions)&lt;/li&gt;
&lt;li&gt;Creating interactive dashboards and reports&lt;/li&gt;
&lt;li&gt;Sharing and publishing reports for business insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 2: Real-World Project Work (Weeks 3-6)&lt;/strong&gt;&lt;br&gt;
After gaining foundational skills, participants will spend the remaining four weeks applying their knowledge to a real-world, industry-based project. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learners will:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Work with real datasets sourced from industry partners&lt;/li&gt;
&lt;li&gt;Solve practical business and AI-driven problems using machine learning and analytical methods&lt;/li&gt;
&lt;li&gt;Collaborate in teams, simulating professional work environments&lt;/li&gt;
&lt;li&gt;Develop a complete data science pipeline (data collection, cleaning, modeling, and visualization)&lt;/li&gt;
&lt;li&gt;Deploy models and dashboards using cloud-based services or on-premise solutions&lt;/li&gt;
&lt;li&gt;Present findings and insights in a structured report and deliver a professional presentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Enroll in This Prep Program?&lt;/strong&gt;&lt;br&gt;
  ✅ Hands-on, immersive learning with real-world applications&lt;br&gt;
  ✅ Intense and focused training for rapid skill development&lt;br&gt;
  ✅ Industry-aligned curriculum designed for immediate applicability&lt;br&gt;
  ✅ Collaboration with peers and expert mentorship&lt;br&gt;
  ✅ A stepping stone to advanced AI and Data Science programs&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commitment and Expectations.&lt;/strong&gt;&lt;br&gt;
This program is not for the faint-hearted. It is an intensive and highly structured learning experience that demands full commitment. Participants should be prepared for a rigorous schedule that mirrors real-world industry expectations.&lt;/p&gt;

&lt;p&gt;Are you ready to challenge yourself and take the first step toward a career in AI, Data Science, and Analytics?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Join us at LuxDevHQ and transform your skills in just six weeks.&lt;/strong&gt; You can register for the program here, &lt;a href="https://forms.gle/jVFV79CKtPLXTx7W9" rel="noopener noreferrer"&gt;https://forms.gle/jVFV79CKtPLXTx7W9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For inquiries and enrollment details, visit our campus on Lenana Road, Kilimani, contact us at &lt;a href="mailto:info@luxdevhq.com"&gt;info@luxdevhq.com&lt;/a&gt;, call or  WhatsApp at 0798166628. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>LuxDevHQ Night of Code - A 20+ Hour Learning, Building, and Networking Event for LuxDevHQ Community.</title>
      <dc:creator>Mwenda Harun Mbaabu</dc:creator>
      <pubDate>Fri, 07 Feb 2025 14:18:37 +0000</pubDate>
      <link>https://dev.to/luxdevhq/luxdevhq-night-of-code-a-20-hour-learning-building-and-networking-event-for-luxdevhq-4lg8</link>
      <guid>https://dev.to/luxdevhq/luxdevhq-night-of-code-a-20-hour-learning-building-and-networking-event-for-luxdevhq-4lg8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgoyyo2vvtllf59psx6s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgoyyo2vvtllf59psx6s.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LuxDevHQ&lt;/strong&gt; is a learning and upskilling platform specializing in data technologies and cloud solutions, dedicated to helping individuals and businesses stay ahead in the fast-evolving data and artificial intelligence landscape. &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;&lt;a href="https://x.com/LuxDevHQ" rel="noopener noreferrer"&gt;LuxDevHQ&lt;/a&gt; operates in two major capacities:&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;✅ &lt;strong&gt;A Cutting-Edge Training Hub,&lt;/strong&gt; offering an intensive and hands-on 6 months boot camp that prepares learners for real world careers in data science, engineering, analytics, and cloud computing.&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;A Data Talent Sourcing Platform,&lt;/strong&gt; connecting businesses with verified, top-tier data and cloud engineering professionals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🎓 LuxDevHQ BootCamp&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The LuxDevHQ boot camp is an intensive program designed to equip learners with industry-ready skills through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;4 months of expert-led training covering SQL, Python, Big Data, and Cloud Technologies.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;2 months of hands-on internship, where participants apply their skills in real-world projects.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This structure bridges the gap between learning and employment by providing mentorship, live projects, and career support to ensure graduates are job-ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;LuxDevHQ Night of Code.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This event is a 20+ hour experience focused on learning, building, and networking. It is designed for LuxDevHQ enrolled students (open to public) to practice their skills and familiarize themselves with the different technologies they will be learning in the program and applying in the field.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nzn6rteb4a0see07g68.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nzn6rteb4a0see07g68.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are excited to host &lt;strong&gt;Stephen Kolesh,&lt;/strong&gt; the top-ranked &lt;a href="https://zindi.africa/community" rel="noopener noreferrer"&gt;Zindi competitor (currently #1 on the leaderboard)&lt;/a&gt;. He will guide attendees in getting started with competitive programming, helping them develop valuable skills while also exploring ways to earn money through competitions.&lt;/p&gt;

&lt;p&gt;Additionally, we will provide CV and résumé workshops, ensuring each attendee receives personalized guidance on how to enhance their profiles and make them stand out in the job market.&lt;/p&gt;

&lt;p&gt;Among the guests are freelancers and industry professionals, who will offer insights into what to expect in the field and share valuable career advice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can register to participate using the link below:&lt;/strong&gt; - &lt;a href="https://paydexp.com/l/31G26" rel="noopener noreferrer"&gt;https://paydexp.com/l/31G26&lt;/a&gt; &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Understanding state and props in React</title>
      <dc:creator>Sheila Kabiro</dc:creator>
      <pubDate>Fri, 02 Feb 2024 09:01:27 +0000</pubDate>
      <link>https://dev.to/luxdevhq/understanding-state-and-props-in-react-4hfa</link>
      <guid>https://dev.to/luxdevhq/understanding-state-and-props-in-react-4hfa</guid>
      <description>&lt;p&gt;State and props are two essential ideas in the world of React that are crucial to creating dynamic and interactive user interfaces. React's component-based architecture is based on these concepts, which allow programmers to write reusable, modular, and maintainable code. We'll go on a journey to understand the relevance and smooth operation of React's state and props, simplifying their enchantment in this article. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Role of State:&lt;br&gt;
What is the State?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At its core, state in React represents the local mutable data that a component manages. It is what allows a component to keep track of information that can change over time – from user interactions, API responses, or any other dynamic aspect of the application.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Class Components and State:&lt;/strong&gt;&lt;br&gt;
Traditionally, class components were the primary way to manage state in React. The &lt;code&gt;setState&lt;/code&gt;  method played a crucial role in updating and re-rendering components based on changes in state. Understanding the component lifecycle methods became essential to grasp the intricacies of managing state effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introducing Hooks:&lt;/strong&gt;&lt;br&gt;
With the advent of React Hooks, especially the useState hook, functional components gained the ability to manage state. This simplified the state management process, making code cleaner and more concise. No longer bound to class components, developers could harness the power of state within functional components effortlessly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key characteristics of state:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Internal to a component.&lt;/li&gt;
&lt;li&gt;Mutable: can be changed using setState.&lt;/li&gt;
&lt;li&gt;Represents dynamic data that affects the component's behavior.&lt;/li&gt;
&lt;li&gt;Example: Clicking a button in the Counter component might call setState to increment the count.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Significance of Props:&lt;br&gt;
What are Props?&lt;/strong&gt;&lt;br&gt;
Props, short for properties, allow components to receive data from their parent components. They are immutable and serve as a way to pass information down the component tree.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Passing and Receiving Props:&lt;/strong&gt;&lt;br&gt;
Props are passed from parent to child components, creating a flow of data within the application. This unidirectional data flow ensures that child components remain predictable and easily maintainable.&lt;/p&gt;

&lt;p&gt;Key characteristics of props:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Passed from parent to child components.&lt;/li&gt;
&lt;li&gt;Read-only.&lt;/li&gt;
&lt;li&gt;Used to configure a component's behavior or appearance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Yin and Yang of React Data Flow&lt;br&gt;
Interaction Between State and Props&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Props and state work together seamlessly to create responsive and dynamic user interfaces. Props provide external configuration, while state manages internal data changes. Here's a helpful analogy:&lt;/p&gt;

&lt;p&gt;Props are like the recipe: Just as a recipe provides instructions on how to bake a cake, props in React provide the necessary information for a component. They are passed down from parent components to child components and are immutable within the component receiving them. Props define what a component should render and how it should behave based on external input.&lt;/p&gt;

&lt;p&gt;State is like the ingredients: While the recipe (props) provides the blueprint, the ingredients (state) are the dynamic elements that can change the outcome of the cake. State is internal to a component and represents its current condition or situation. It can be modified by the component itself through setState(), leading to re-rendering and updates in the component's UI.&lt;/p&gt;

&lt;p&gt;Common Use Cases: When to Use Props vs. State&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use props for data that comes from a parent component and shouldn't be changed by the child.&lt;/li&gt;
&lt;li&gt;Use state for data that is internal to the component and can change over time.&lt;/li&gt;
&lt;li&gt;Props are ideal for static configuration, while state is perfect for dynamic behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anybody starting to build modern web applications needs to understand state and props in React. Developers may design dynamic, interactive, and maintainable user interfaces by grasping these concepts. Remember that state and props are your buddies when you start your React journey; they will enable you to easily create applications that are reliable and scalable. Happy coding!&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>react</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Maximizing Efficiency and Savings: A Guide to Optimizing Amazon Redshift</title>
      <dc:creator>Mwenda Harun Mbaabu</dc:creator>
      <pubDate>Mon, 11 Sep 2023 16:14:12 +0000</pubDate>
      <link>https://dev.to/luxdevhq/maximizing-efficiency-and-savings-a-guide-to-optimizing-amazon-redshift-3ddl</link>
      <guid>https://dev.to/luxdevhq/maximizing-efficiency-and-savings-a-guide-to-optimizing-amazon-redshift-3ddl</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffsua77i6345opv9ybeko.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffsua77i6345opv9ybeko.png" alt="Image description" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Redshift&lt;/strong&gt; serves as a robust data warehousing service that assumes a pivotal role in the management of large-scale data analytics for organizations.&lt;/p&gt;

&lt;p&gt;As a data engineer, your interaction with AWS Redshift becomes indispensable if your company prefers it as the data warehousing technology, or if your organization has embraced it as the central data lakehouse tool to leverage the combined advantages of a data lake and warehouse within a unified platform.&lt;/p&gt;

&lt;p&gt;To fully exploit the capabilities of Redshift while concurrently managing costs and ensuring the efficiency of query performance, optimization becomes imperative. &lt;/p&gt;

&lt;p&gt;In this article, we will delve into a set of strategies designed to assist you in optimizing Amazon Redshift for both cost-effectiveness and query performance. This endeavor will not only result in cost savings for your organization but also enhance query speed, benefiting you as a developer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We will be discussing several strategies, including:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data Modeling&lt;/li&gt;
&lt;li&gt;Data Loading&lt;/li&gt;
&lt;li&gt;Compression&lt;/li&gt;
&lt;li&gt;Query Optimization&lt;/li&gt;
&lt;li&gt;Concurrency Scaling&lt;/li&gt;
&lt;li&gt;Workload Management (WLM)&lt;/li&gt;
&lt;li&gt;Partitioning&lt;/li&gt;
&lt;li&gt;Vacuuming and Analyzing&lt;/li&gt;
&lt;li&gt;Monitoring and Alerts&lt;/li&gt;
&lt;li&gt;Redshift Spectrum&lt;/li&gt;
&lt;li&gt;Redshift Advisor and Reserved Instances &lt;/li&gt;
&lt;li&gt;Regular Review and Optimization&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1). Data Modeling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The foundation of effective Redshift optimization begins with smart data modeling decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data Distribution and Sort Keys: The choice of data distribution style (even, key, or all) and sort keys for your tables can significantly impact query performance. It's essential to select these attributes thoughtfully based on your specific needs.&lt;/li&gt;
&lt;li&gt;Normalization vs. Denormalization: Evaluate your query patterns to decide whether to normalize or denormalize your data. Normalization conserves storage space, while denormalization can enhance query performance. Your choice should align with your unique requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2). Data Loading&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Efficient data loading processes are crucial for Redshift optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COPY Command: Utilize the COPY command for bulk data loading instead of INSERT operations. It is not only faster but also more cost-effective, particularly when dealing with substantial data volumes.&lt;/li&gt;
&lt;li&gt;Amazon S3 Staging: Consider using Amazon S3 as a staging area for data loading. This approach simplifies the process and reduces load times, enhancing overall efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3). Compression&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Optimizing storage with proper compression techniques can lead to substantial savings and improved query performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compression Encodings: Employ suitable compression encodings for columns to save storage costs and boost query performance. Selecting the right encodings is key to success.&lt;/li&gt;
&lt;li&gt;ANALYZE Command: Run the ANALYZE command periodically to update statistics. This aids the query planner in making informed decisions regarding data distribution and compression.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4). Query Optimization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Fine-tuning your queries can significantly impact performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EXPLAIN Command: Use the EXPLAIN command to analyze query plans and identify performance bottlenecks. This helps in pinpointing areas that require optimization.&lt;/li&gt;
&lt;li&gt;Column Selection: Avoid using SELECT * in queries; instead, explicitly list the columns you need. This reduces unnecessary data transfer and computation.&lt;/li&gt;
&lt;li&gt;Minimize DISTINCT and ORDER BY: Minimize the use of DISTINCT and ORDER BY clauses, as they can be computationally expensive. Use them only when necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5). Concurrency Scaling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Efficiently managing query concurrency is vital:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic Concurrency Scaling: Enable automatic concurrency scaling to handle query load spikes without sacrificing performance.&lt;/li&gt;
&lt;li&gt;Custom Concurrency Settings: Adjust concurrency scaling settings based on your workload and requirements, striking the right balance between cost and performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6). Workload Management (WLM)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Effectively allocate resources among different query workloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WLM Queues: Utilize WLM queues to distribute resources efficiently. Set appropriate memory and concurrency values for each queue to optimize both cost and performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7). Partitioning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For large tables with specific query patterns, partitioning is a game-changer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table Partitioning: Implement table partitioning if you frequently query specific date ranges or subsets of data. This enhances query performance and reduces costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;8). Vacuuming and Analyzing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Maintenance tasks are essential for long-term optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VACUUM and ANALYZE: Regularly run the VACUUM and ANALYZE commands to reclaim storage space and keep statistics up-to-date, ensuring peak performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;9). Monitoring and Alerts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stay proactive with monitoring and alert systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring Tools: Implement monitoring and set up alerts to track query performance and resource utilization. Services like Amazon CloudWatch can be invaluable for this purpose.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;10). Redshift Spectrum&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Leverage Redshift Spectrum for cost-effective data querying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon S3 Integration: Consider using Redshift Spectrum to query data stored in Amazon S3 directly, especially for historical or less-frequently accessed data. This can significantly reduce storage costs
.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;11) Redshift Advisor and Reserved Instances&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Utilize built-in tools for guidance and cost savings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redshift Advisor:&lt;/strong&gt; Take advantage of the Redshift Advisor tool, which provides recommendations for optimizing your cluster's performance and cost-efficiency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reserved Instances (RIs):&lt;/strong&gt; If your Redshift usage is steady, consider purchasing Reserved Instances to lower your per-hour costs, providing predictability and savings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;12). Regular Review and Optimization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Continuous improvement is the key to success:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance and Cost Metrics:&lt;/strong&gt; Regularly review your cluster's performance and cost metrics to identify opportunities for optimization. Adapting to changing needs is crucial.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Optimizing Amazon Redshift for cost and query performance is not a one-time task but rather an &lt;strong&gt;ongoing journey&lt;/strong&gt; that requires a deep understanding of your data, workload, and business objectives. By implementing the strategies mentioned in this article and staying vigilant, you can continuously fine-tune your Redshift cluster to strike the right balance between cost savings and efficient data analytics. This iterative process ensures that your organization maximizes the benefits of this powerful data warehousing service, adapting to evolving needs and extracting valuable insights from your data.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
