DEV Community: Kenyansa Felix Amenya

Sales Prediction API – FastAPI Deployment

Kenyansa Felix Amenya — Tue, 10 Feb 2026 13:06:12 +0000

Sales Prediction API – FastAPI Deployment

Project Overview

This project demonstrates how to deploy a trained sales prediction model as a RESTful API using FastAPI. The API allows users to send business-related inputs and receive predicted sales values in real time. The solution bridges the gap between data analysis/modeling and real-world application by making predictions accessible to other systems such as dashboards, web apps, or backend services.

The focus of this project is model deployment and API design, not model training.

Problem Statement

Retail businesses need reliable sales forecasts to support decision-making in areas such as marketing, inventory planning, and operational scaling. This API predicts total sales based on key drivers:

Marketing spend
Store size
Seasonal timing (month number)

The goal is to expose the prediction logic through a clean, validated, and easy-to-use API.

Model and Data Description

The model was trained offline using historical sales data. Since sales is a continuous numeric variable, the task was formulated as a regression problem.

Input Features

The model uses the following features:

Marketing_Spend (float): Amount spent on marketing activities
Store_Size (float): Size or capacity indicator of the store
Month_Number (int): Month of the year (used to capture seasonality)

Saved Artifacts

After training and evaluation, the following files were saved using joblib:

sales.joblib – the trained regression model
sales_features.joblib – ordered list of feature names used during training

Saving the feature list ensures consistency between training and prediction.

Why FastAPI?

FastAPI is a modern Python framework optimized for building APIs. It was chosen for this project because it offers:

High performance and fast response times
Automatic request validation using Pydantic
Clear and interactive API documentation
Simple and clean syntax

These features make FastAPI especially suitable for deploying machine learning and statistical models.

API Architecture

The FastAPI application performs the following steps:

Loads the trained model and feature list at startup
Validates incoming requests using a Pydantic schema
Formats input data to match the model’s expected structure
Generates predictions using the trained model
Returns results as JSON responses

FastAPI Implementation

Application Initialization

The model and feature names are loaded once when the API starts:

model = joblib.load('sales.joblib')
feature_names = joblib.load('sales_features.joblib')

The FastAPI application is initialized with a descriptive title:

sale = FastAPI(title="Sales Prediction API")

Request Schema

Incoming requests are validated using a Pydantic model:

class SalesFeatures(BaseModel):
    Marketing_Spend: float
    Store_Size: float
    Month_Number: int

This ensures that all required fields are present and correctly typed before prediction is performed.

API Endpoints

Root Endpoint

GET /

Used to verify that the API is running.

Response:

{
  "message": "Welcome to the Sales Prediction API!!"
}

Prediction Endpoint

POST /predict

Accepts sales-related inputs and returns a predicted sales value.

Example Request

{
  "Marketing_Spend": 50000,
  "Store_Size": 1200,
  "Month_Number": 6
}

Example Response

{
  "predicted_sales": 245000.75
}

The API internally:

Orders input features using the saved feature list
Converts inputs into a NumPy array
Generates a prediction using the trained model

Testing the API

FastAPI automatically provides interactive documentation at:

http://127.0.0.1:8000/docs

This interface allows users to test endpoints, submit sample inputs, and view responses without additional tools.

The API can also be tested using Postman or cURL for integration testing.

How to Run the API

Install dependencies:

pip install fastapi uvicorn joblib numpy

Start the server:

uvicorn main:sale --reload

Open the browser and navigate to:
http://127.0.0.1:8000/ – API root
http://127.0.0.1:8000/docs – interactive documentation

Conclusion

This project demonstrates how a trained sales prediction model can be deployed as a production-ready API using FastAPI. By combining strict input validation, efficient model loading, and clear endpoint design, the API enables reliable and scalable access to predictive insights. This approach allows organizations to integrate analytics directly into operational systems, turning data models into actionable business tools.

Ridge vs. Lasso Regression: A Clear Guide to Regularization Techniques

Kenyansa Felix Amenya — Mon, 26 Jan 2026 18:56:16 +0000

Ridge vs. Lasso Regression: A Clear Guide to Regularization Techniques

In the world of machine learning, linear regression is often one of the first algorithms we learn. But standard linear regression has a critical weakness: it can easily overfit to training data, especially when dealing with many features. This is where Ridge and Lasso regression come in—two powerful techniques that prevent overfitting and can lead to more interpretable models. Let's break down how they work, their differences, and when to use each.

THE CORE PROBLEM: OVERFITTING
Imagine you're trying to predict house prices based on features like size, age, number of bedrooms, proximity to a school, and even the color of the front door. A standard linear regression might assign some weight (coefficient) to every single feature, even the irrelevant ones (like door color). It will fit the training data perfectly but will fail miserably on new, unseen houses. This is overfitting.

Ridge and Lasso solve this by adding a "penalty" to the regression model's objective. This penalty discourages the model from relying too heavily on any single feature, effectively simplifying it.

RIDGE REGRESSION: THE GENTLE MODERATOR
What it does: Ridge regression (also called L2 regularization) adds a penalty equal to the square of the magnitude of the coefficients.

Simple Explanation: Think of Ridge as a strict but fair moderator in a group discussion. It allows everyone (every feature) to speak, but it prevents any single person from dominating the conversation. No feature's coefficient is allowed to become extremely large, but very few are ever set to zero.

The Math (Simplified):
The Ridge model tries to minimize:
(Sum of Squared Errors) + λ * (Sum of Squared Coefficients)

Where λ (lambda) is the tuning parameter. A higher λ means a stronger penalty, pushing all coefficients closer to zero (but never exactly zero).

Example:
`Predicting a student's final exam score (y) using:

x1: Hours studied (truly important)
x2: Number of pencils they own (irrelevant noise)`

A standard regression might output: Score = 5.0*(Hours) + 0.3*(Pencils)

Ridge regression, with its penalty, might output: Score = 4.8*(Hours) + 0.05*(Pencils)

See what happened? The coefficient for the important feature (Hours) shrank slightly, and the coefficient for the nonsense feature (Pencils) shrank dramatically. The irrelevant feature is suppressed but not removed.

LASSO REGRESSION: THE RUTHLESS SELECTOR
What it does: Lasso regression (also called L1 regularization) adds a penalty equal to the absolute value of the magnitude of the coefficients.

Simple Explanation: Lasso is a ruthless talent scout. It evaluates all features and doesn't just quiet down the weak ones—it completely eliminates those it deems unnecessary. It performs feature selection.

The Math (Simplified):
The Lasso model tries to minimize: (Sum of Squared Errors) + λ * (Sum of Absolute Coefficients)

Example:
Using the same student score prediction:

A standard regression might output: Score = 5.0*(Hours) + 0.3*(Pencils)

Lasso regression, with its penalty, might output: Score = 4.9*(Hours) + 0.0*(Pencils)

The coefficient for Pencils has been forced to absolute zero. Lasso has identified it as useless and removed it from the model entirely, leaving a simpler, more interpretable model.

HEAD-TO-HEAD COMPARISON
Feature Ridge Regression ** Lasso Regression**
Penalty Term Sum of squared coefficients Sum of absolute
coefficients
Effect on Coefficients Shrinks them
smoothly towards zero Can force coefficients to
exactly zero
Feature Selection No. Keeps all features. Yes. Creates sparse
models.
Use Case When you believe all features are relevant, but need to reduce overfitting. When you have many features and suspect only a
subset are important.
Good for Handling multicollinearity (highly correlated features). Building simpler, more interpretable models.
Geometry Penalty region is a circle. Solution tends to be where the error contour touches the circle. Penalty region is a diamond. Solution often occurs at a corner, zeroing out coefficients.

VISUAL ANALOGY: THE FITTING GAME
Imagine you're fitting a curve to points on a graph, with two dials (coefficients) to adjust.

Standard Regression: You only care about getting the line as close to the points as possible. You might turn both dials to extreme positions to fit perfectly.
Ridge: You have a second goal: you don't want the dials to point to very high numbers. You find a balance between fit and keeping the dial settings moderate.
Lasso: You have a second goal: you want as few dials as possible to be far from the "off" position. You're willing to turn a dial all the way to "OFF" (zero) if it doesn't help enough.

WHICH ONE SHOULD YOU USE?

Choose Ridge if you have many features that all have some meaningful relationship to the output. It’s often the safer, more stable choice.
Choose Lasso if you're in an exploratory phase, have a huge number of features (e.g., hundreds of genes predicting a disease), and want to identify the most critical ones. The built-in feature selection is a huge advantage for interpretability.
Pro-Tip: There's also Elastic Net, which combines both Ridge and Lasso penalties. It’s a great practical compromise that often delivers the best performance.

IN CONCLUSION
Both Ridge and Lasso are essential tools that move linear regression from a simple baseline to a robust, modern technique.

Ridge regression is your go-to for general purpose prevention of overfitting. It's reliable and handles correlated data well.
Lasso regression is your tool for creating simple, interpretable models by automatically selecting only the most important features.

By understanding their distinct "philosophies"—moderation vs. selection—you can strategically choose the right tool to build models that are not only accurate but also generalize well to the real world.

Object Oriented programming

Kenyansa Felix Amenya — Tue, 16 Dec 2025 19:23:17 +0000

Guide to Understanding Classes in Object-Oriented Programming:

What is a Class?
In Object-Oriented Programming (OOP), a class is like a blueprint or template for creating objects. Think of it as a cookie cutter that defines the shape and ingredients of cookies, while the cookies themselves are the objects. A class encapsulates data (attributes) and behaviors (methods) that operate on that data into a single, organized unit.

Why Classes are Useful?
Classes solve several programming challenges:

Organization: They group related data and functions together
Reusability: Once defined, a class can create multiple similar objects
Abstraction: They hide complex implementation details
Modularity: Different parts of a program can be developed independently
Real-world modeling: They help represent real-world entities in code Key Components: Attributes and Methods • Attributes: Variables that store data (like characteristics of an object) • Methods: Functions that define behaviors or actions the object can perform

Lets use an Example: BankAccount Class
class BankAccount:
** A simple BankAccount class to demonstrate OOP concepts**


    def __init__(self, account_number, owner, initial_balance=0):
        """
        Constructor method - called when creating a new BankAccount object
        :param account_number: Unique account identifier
        :param owner: Name of account owner
        :param initial_balance: Starting balance (defaults to 0)
        """
        # These are ATTRIBUTES (data/properties)
        self.account_number = account_number
        self.owner = owner
        self.balance = initial_balance
        print(f"Account created for {self.owner} with balance: ${self.balance:.2f}")

   ** # These are METHODS (behaviors/actions)**

    def deposit(self, amount):
        """Add money to the account"""
        if amount > 0:
            self.balance += amount
            print(f"Deposited ${amount:.2f}. New balance: ${self.balance:.2f}")
        else:
            print("Deposit amount must be positive!")
        return self.balance

    def withdraw(self, amount):
        """Remove money from the account if sufficient funds exist"""
        if amount > 0:
            if amount <= self.balance:
                self.balance -= amount
                print(f"Withdrew ${amount:.2f}. New balance: ${self.balance:.2f}")
            else:
                print(f"Insufficient funds! Available: ${self.balance:.2f}")
        else:
            print("Withdrawal amount must be positive!")
        return self.balance

    def check_balance(self):
        """Return the current balance"""
        print(f"Account balance for {self.owner}: ${self.balance:.2f}")
        return self.balance

    def account_info(self):
        """Display all account information"""
        print("\n" + "="*40)
        print("ACCOUNT INFORMATION")
        print("="*40)
        print(f"Owner: {self.owner}")
        print(f"Account Number: {self.account_number}")
        print(f"Current Balance: ${self.balance:.2f}")
        print("="*40 + "\n")

Creating and Using Objects
Creating objects (instances) from the BankAccount class
Each object is independent with its own data

Create Alice's account with $1000 initial deposit
alice_account = BankAccount("ACC001", "Alice Johnson", 1000)

Create Bob's account with default $0 balance
bob_account = BankAccount("ACC002", "Bob Smith")

Using methods on Alice's account
alice_account.deposit(500) # Alice deposits $500 alice_account.withdraw(200) # Alice withdraws $200 alice_account.check_balance() # Check Alice's balance alice_account.account_info() # Get full account info

Using methods on Bob's account

bob_account.deposit(100) # Bob deposits $100 bob_account.withdraw(50) # Bob withdraws $50 bob_account.withdraw(100) # This should fail - insufficient funds

Each object maintains its own state
print(f"\nAlice's balance: ${alice_account.balance:.2f}") print(f"Bob's balance: ${bob_account.balance:.2f}")

Direct attribute access (though usually done through methods)
print(f"\nAlice's account number: {alice_account.account_number}")

How Classes Structure Real-World Problems
Our BankAccount example demonstrates how classes help structure programming problems:

Encapsulation: All bank account-related data and operations are in one place
State Management: Each account maintains its own balance independently
Controlled Interaction: Methods like withdraw() include validation logic
Clear Interface: Other parts of the program can use accounts without knowing internal details
Extending the Class (Optional Challenge) class EnhancedBankAccount(BankAccount):

An extended version with additional features


    def __init__(self, account_number, owner, initial_balance=0, account_type="Savings"):
        # Call parent class constructor
        super().__init__(account_number, owner, initial_balance)
        self.account_type = account_type
        self.transaction_history = []

    def deposit(self, amount):
        # Extend the parent method
        result = super().deposit(amount)
        self.transaction_history.append(f"Deposit: +${amount:.2f}")
        return result

    def withdraw(self, amount):
        result = super().withdraw(amount)
        self.transaction_history.append(f"Withdrawal: -${amount:.2f}")
        return result

Best Practices for Beginners

Meaningful Names: Use descriptive names for classes, attributes, and methods
Single Responsibility: Each class should have one clear purpose
Use Methods for Actions: Change attributes through methods rather than directly
Start Simple: Begin with basic classes and add complexity gradually
Practice: Create classes for everyday objects (Book, Car, Student, etc.)

Example

bank_account_demo.py

This is a complete BankAccount Class Example

Save this as bank_account_demo.py and run it with Python

Lets write a simple BankAccount class to demonstrate OOP concepts

class BankAccount:
    def __init__(self, account_number, owner, initial_balance=0):
        self.account_number = account_number
        self.owner = owner
        self.balance = initial_balance
        print(f"Account created for {self.owner} with balance: ${self.balance:.2f}")

    def deposit(self, amount):
        if amount > 0:
            self.balance += amount
            print(f"Deposited ${amount:.2f}. New balance: ${self.balance:.2f}")
        else:
            print("Deposit amount must be positive!")
        return self.balance

    def withdraw(self, amount):
        if amount > 0:
            if amount <= self.balance:
                self.balance -= amount
                print(f"Withdrew ${amount:.2f}. New balance: ${self.balance:.2f}")
            else:
                print(f"Insufficient funds! Available: ${self.balance:.2f}")
        else:
            print("Withdrawal amount must be positive!")
        return self.balance

    def check_balance(self):
        print(f"Account balance for {self.owner}: ${self.balance:.2f}")
        return self.balance

Main execution

if __name__ == "__main__":
    print("=== BANK ACCOUNT DEMONSTRATION ===\n")

    # Create accounts
    account1 = BankAccount("ACC001", "Alice Johnson", 1000)
    account2 = BankAccount("ACC002", "Bob Smith")

    print("\n=== Performing Transactions ===\n")

    # Account 1 transactions
    account1.deposit(500)
    account1.withdraw(200)

    # Account 2 transactions
    account2.deposit(300)
    account2.withdraw(100)
    account2.withdraw(250)  # Should fail

    print("\n=== Final Balances ===\n")
    account1.check_balance()
    account2.check_balance()

Conclusion
Classes are fundamental to Object-Oriented Programming because they provide a structured way to model real-world entities. By bundling related data and behaviors together, they make code more organized, reusable, and easier to understand. The BankAccount example demonstrates how even beginners can create useful, real-world simulations using classes. As you practice, you'll discover that classes are powerful tools for breaking down complex problems into manageable, logical units.
Remember: A class defines what an object is (attributes) and what it does (methods), while objects are the actual instances you work with in your program.

Connecting Power BI to PostgreSQL (Localhost & Aiven Cloud)

Kenyansa Felix Amenya — Mon, 17 Nov 2025 19:51:03 +0000

The Complete Guide: Connecting Power BI to PostgreSQL (Localhost & Aiven Cloud)
In this article you will learn how to bridge your data visualization with PostgreSQL databases—whether running locally or in the cloud
Introduction
Power BI has become the go-to business intelligence tool for millions of users, while PostgreSQL remains one of the most popular open-source databases. Connecting these two powerful tools can unlock tremendous insights from your data. In this comprehensive guide, I'll walk you through connecting Power BI to both local PostgreSQL instances and PostgreSQL hosted on Aiven, complete with troubleshooting tips and best practices.
Part 1: Connecting to Local PostgreSQL
Prerequisites
Before we begin, ensure you have:
• Power BI Desktop installed
• PostgreSQL running locally
• Database credentials (username, password, database name)
• PostgreSQL ODBC Driver (usually installed with Power BI)
Step 1: Install PostgreSQL ODBC Driver
First, verify you have the PostgreSQL ODBC driver installed:
Windows Check:

Go to ODBC Data Sources in Windows Search
Check if PostgreSQL Unicode or PostgreSQL ANSI driver exist If missing, download from:

link
# Official PostgreSQL ODBC driver
https://www.postgresql.org/ftp/odbc/versions/

Step 2: Power BI Connection Setup

Open Power BI Desktop
Click Get Data → More...
Select Database → PostgreSQL database
Click Connect Step 3: Configure Connection Parameters Fill in your local PostgreSQL details:

text
Server: localhost
Database: your_database_name
Username: your_username
Password: your_password

Step 4: Data Preview and Load

Select tables or write custom SQL
Preview data to verify connection
Click Load to import or Transform Data for ETL

Part 2: Connecting to Aiven PostgreSQL
What is Aiven?
Aiven is a managed cloud database service that provides PostgreSQL as a service with enterprise-grade features.

Step 1: Gather Aiven Connection Details

Log into your Aiven console
Select your PostgreSQL service
Copy connection details from the Overview tab

Key information needed:
• Hostname
• Port (usually 12715)
• Database name
• Username
• Password
• SSL mode
• NB you must have a paid Aiven account to connect

Step 2: Download SSL Certificate (Required for Aiven)
Aiven requires SSL connections:

In Aiven console, go to Overview tab
Scroll to Connection information
Download CA certificate Step 3: Power BI Connection to Aiven
Get Data → PostgreSQL database
Enter Aiven connection details:

Text:
Server: your-service-name.aivencloud.com:12345
Database: defaultdb
Username: avnadmin
Password: your-password

Advanced options → Add SSL parameters:

powerquery let Source = PostgreSQL.Database( "your-service.aivencloud.com:12345", "defaultdb", [ CreateNavigationProperties = true, SSLMode = "Require", UseSSL = true ] ) in Source
Step 4: Handle SSL Certificate (If Required)
For additional SSL verification:

Windows → Place certificate in Trusted Root Certification Authorities 2.Power BI → May require certificate path in advanced settings Part 3: Advanced Configuration Connection String Parameters

powerquery // Advanced connection with multiple parameters let Source = PostgreSQL.Database( "host:port", "database", [ CreateNavigationProperties = false, CommandTimeout = #duration(0, 0, 10, 0), ConnectionTimeout = #duration(0, 0, 5, 0), SSLMode = "Require", UseSSL = true ] ) in Source

Part 4: Common Issues & Troubleshooting
Issue 1: "DataSource.Error: Unable to Connect"
Solutions:
• Verify PostgreSQL service is running
• Check firewall settings
• Confirm port 5432 is open
• Validate credentials

`bash

Test connection from command line

psql -h localhost -p 5432 -U username -d database_name`

Issue 2: SSL Connection Errors (Aiven)
Solutions:
• Ensure SSL mode is set to "Require"
• Verify certificate installation
• Check Aiven service status
Issue 3: Performance Issues
Optimization tips:
• Use query folding with native database queries
• Import only necessary columns
• Implement incremental refresh
• Use database views for complex transformations
Issue 4: Authentication Failures
Check:
• PostgreSQL pg_hba.conf configuration
• Password encryption method
• User privileges and roles

sql -- Check user privileges in PostgreSQL SELECT usename, useconfig FROM pg_user WHERE usename = 'your_username';

Part 5: Best Practices
Security
• Use strong passwords must also be simple to remember
• Enable SSL for all connections
• Implement row-level security in Power BI
• Regular credential rotation

Performance
• Use query folding when possible
• Implement incremental refresh for large datasets
• Create database indexes on filtered columns
• Use direct query for real-time requirements
Maintenance
• Monitor connection timeouts
• Regular Power BI updates
• Database performance tuning
• Backup connection configurations
Conclusion
Connecting Power BI to PostgreSQL—whether locally or via Aiven—opens up powerful data analysis capabilities. The key steps are:

Ensure proper drivers and prerequisites
Gather accurate connection details
Configure SSL for cloud connections
Test and optimize performance
Implement security best practices By following this guide, you can seamlessly bridge your PostgreSQL data with Power BI's robust visualization capabilities, enabling data-driven decision making across your organization.

Resources

Pharmaceutical RCPA Analytics Challenge

Kenyansa Felix Amenya — Mon, 17 Nov 2025 12:22:55 +0000

How I Solved the Pharmaceutical RCPA Analytics Challenge: A Complete Power BI Case Study
📋 Project Background
The Business Problem
A pharmaceutical company needed to transform raw Retail Chemist Prescription Audit (RCPA) data into actionable insights to:
• Track prescription performance against targets
• Monitor doctor conversion trends
• Analyze brand competition across regions
• Support medical representatives with data-driven decisions
The Data Challenge
I received four key datasets that required significant transformation:

RCPA Reporting Form - Complex, unpivoted prescription data
Product Master - Product hierarchy and mappings
Brand Targets - Monthly performance targets
Expected Transformation - The desired output structure ________________________________________ 🛠️ Phase 1: The ETL Process - Power Query Transformation Step 1: Understanding the Raw Data Structure The RCPA data came in a wide, complex format with: • Multiple medical representatives per row • Different regions and chemists combined • Focus products and competitor products mixed • Inconsistent delimiters and formatting Step 2: Creating Focus RCPA Data Table Key Transformations Applied:

powerquery

// Unpivot medical representative columns
= Table.UnpivotOtherColumns(#"Previous Step", {"Region", "Doctor"}, "Attribute", "Value")

// Split chemist information using custom delimiters
= Table.SplitColumn(#"Previous Step", "Chemist", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), {"Chemist.1", "Chemist.2", "Chemist.3"})

// Extract product and prescription quantities
= Table.AddColumn(#"Previous Step", "Custom", each Text.Split([Focus Products], ","))
= Table.ExpandListColumn(#"Previous Step", "Custom")
= Table.SplitColumn(#"Previous Step", "Custom", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Product", "Rx_Qty"})

Data Quality Checks:
• Removed empty rows and null values
• Standardized text formatting
• Ensured each chemist had exactly 9 focus products
• Validated prescription quantity formats
Step 3: Creating Competitor RCPA Data Table
Similar transformations but with different business rules:
• Each chemist contained 6 competitor products
• Different product mapping logic
• Separate relationship structure
Step 4: Preparing Dimension Tables
Product Master Cleanup:
• Removed header rows
• Standardized product codes and names
• Created unique identifiers for relationships
Brand Targets Preparation:
• Cleaned target quantities
• Established proper date hierarchies
• Created product-code mappings

🔗** Phase 2: Data Modeling**
The Schema Design
I implemented a hybrid star-snowflake schema:

text
Fact Tables:
├── Focus_RCPA_Data
└── Competitor_RCPA_Data

Dimension Tables:
├── Product_Master
├── Brand_Targets
├── Doctor_Dim
├── Region_Dim
├── Medical_Rep_Dim
└── Date_Dim

Relationship Strategy
• One-to-Many relationships from dimensions to facts
• Bi-directional filtering where appropriate
• Role-playing dimensions for date analysis
• Bridge tables for many-to-many relationships
DAX Measures Foundation

dax
-- Core performance metrics
Total Rx = SUM(Focus_RCPA_Data[Rx_Qty])
Total Target = SUM(Brand_Targets[Target_Qty])
Achievement % = DIVIDE([Total Rx], [Total Target], 0)

-- Competition analysis
Focus Brand Share = DIVIDE([Focus Rx], [Total Market Rx])
Competitor Share = 1 - [Focus Brand Share]

📊 Phase 3: Visualization Development
Dashboard 1: Doctor Rx Performance
Design Approach:
• Hierarchical drill-down: Region → Medical Rep → Doctor → Brand
• KPI cards for quick performance assessment
• Matrix visual for detailed analysis
• Conditional formatting for target achievement
Key Insights Delivered:
• Top-performing medical representatives by region
• Brands exceeding or missing targets
• Regional performance patterns
• Doctor-level prescription trends
Dashboard 2: Doctor Conversion Status
The Challenge:
Defining and tracking "Doctor Conversion" - when doctors prescribe target quantities for at least 3 consecutive RCPAs.
Solution Implementation:

dax
Doctor Conversion Status = 
VAR ConsecutivePeriods = 
    CALCULATE(
        COUNTROWS(VALUES('Date'[Month])),
        FILTER(
            ALLSELECTED('Date'[Month]),
            [Total Rx] >= [Total Target]
        )
    )
RETURN
    IF(ConsecutivePeriods >= 3, "Converted", "Not Converted")

Visualization Features:
• Funnel chart showing conversion pipeline
• Timeline analysis of conversion trends
• Doctor profiling with prescription history
• Alert system for at-risk conversions
Dashboard 3: Brand Competition Analysis
Methodology:
• Market share calculation by region
• Competitive benchmarking
• Trend analysis over time
• Geographic heat maps
Competitive Intelligence:
• Identified regional strongholds and weak spots
• Tracked competitor market penetration
• Provided insights for regional strategy adjustments

🎯** Key Technical Challenges & Solutions**
Challenge 1: Complex Data Unpivoting
Problem: Multiple levels of nested data in single columns
Solution: Custom Power Query functions with iterative splitting and error handling
Challenge 2: Doctor Conversion Logic
Problem: Business rule required 3 consecutive periods of target achievement
Solution: DAX time intelligence functions with rolling window calculations
Challenge 3: Performance Optimization
Problem: Large dataset causing slow report loading
Solution:
• Query folding optimization
• Aggregated tables for summary views
• Strategic relationship management
Challenge 4: Dynamic Competition Analysis
Problem: Comparing focus brands against multiple competitors
Solution: Parameter tables and what-if analysis for flexible benchmarking

📈 Business Impact Delivered
Quantitative Results
• 97% data accuracy in transformation process
• 60% reduction in manual reporting time
• Real-time performance tracking vs monthly manual process
• 360-degree view of prescription ecosystem
Strategic Insights Generated

Identification of top 15% doctors driving 45% of prescriptions
Detection of regional competition patterns
Optimization of medical representative territories
Forecasting of conversion pipeline health ________________________________________ 🏆 Lessons Learned Technical Takeaways
Power Query is powerful for complex data reshaping
DAX context transition is crucial for accurate calculations
Data model design directly impacts user experience
Iterative development with stakeholder feedback is essential Business Insights
Clean data foundation enables advanced analytics
User-friendly visualizations drive adoption
Regular data validation maintains trust in insights
Scalable architecture supports future requirements ________________________________________ 🔮 Future Enhancements Planned Improvements • Machine learning integration for prescription forecasting • Mobile-optimized views for field representatives • Automated alerting system for performance deviations • Integration with CRM data for complete customer view Expansion Opportunities • Additional data sources (inventory, marketing campaigns) • Advanced analytics (prescription pattern recognition) • Predictive modeling for doctor conversion probability • Executive dashboard with strategic KPIs ________________________________________ ✅ Conclusion This project demonstrated how strategic data transformation combined with thoughtful visualization can turn complex pharmaceutical data into actionable business intelligence. The solution not only met the immediate reporting requirements but also established a scalable foundation for ongoing analytics and decision support. The key success factors were: • Deep understanding of business processes • Robust ETL architecture • User-centered design approach • Continuous validation with stakeholders By solving this challenge, we enabled data-driven decision making across sales, marketing, and medical teams, ultimately supporting better patient outcomes through optimized prescription strategies.

Power BI: Star Schema vs Snowflake Schema

Kenyansa Felix Amenya — Mon, 17 Nov 2025 11:54:01 +0000

Power BI: Star Schema vs Snowflake Schema
Star Schema
A star schema is defined as the simplest data warehouse schema where one or more fact tables reference any number of dimension tables in a star-like structure.
Structure
• Fact Table: Central table containing business metrics and foreign keys
• Dimension Tables: Surrounding tables connected directly to the fact table
• Denormalized: Dimension tables contain all related data
Example of star schema

Advantages
• Simpler Queries: Fewer JOINs required
• Better Performance: Faster query execution
• Easy to Understand: Intuitive structure for business users
• Optimized for Reporting: Ideal for Power BI and analytics
• Reduced Complexity: Minimal table relationships
Disadvantages
• Data Redundancy: Repeated data in dimension tables
• Storage Inefficiency: Larger storage requirements
• Update Anomalies: Potential data inconsistency
• Less Flexible: Harder to accommodate changes
Snowflake Schema
A snowflake schema is defined as a normalized version of the star schema where dimension tables are broken down into multiple related tables.
Structure
• Fact Table: Central table with foreign keys
• Normalized Dimensions: Hierarchical dimension tables
• Multiple Levels: Dimensions split into sub-dimensions
Example of a snowflake schema

Advantages
• Reduced Data Redundancy: Normalized structure
• Storage Efficiency: Smaller storage footprint
• Data Integrity: Better consistency
• Flexibility: Easier to accommodate changes
• Better for OLTP: Closer to operational databases
Disadvantages
• Complex Queries: More JOINs required
• Slower Performance: Reduced query speed
• Harder to Understand: More complex for business users
• Maintenance Overhead: More tables to manage
When to Use Each Schema
Use Star Schema When:
• Primary Use Case: Business intelligence and reporting
• Performance Critical: Fast query response needed
• Business User Focus: End users need simplicity
• Power BI/Tableau: Optimized for visualization tools
• Read-Intensive: Heavy reporting workload
• Data Marts: Department-specific analytics
Use Snowflake Schema When:
• Primary Use Case: Complex data relationships
• Storage Constraints: Limited storage capacity
• Data Integrity: High consistency requirements
• Source System: Mirroring normalized source data
• ETL Processes: Easier incremental loading
• Regulatory Compliance: Strict data governance
Power BI Considerations
Star Schema is Recommended Because:

DAX Optimization: Better performance with measures
Relationship Simplicity: Cleaner model relationships
User-Friendly: Easier for report consumers
Query Performance: Faster refresh and calculation
Best Practice: Microsoft's recommended approach

Reference Table showing the differences

Aspect  Star Schema      Snowflake Schema
Performance High         Low
Storage     Low      High
Complexity  High         low
Flexibility Minimal       Highly flexible
Data Integrity  Low data integrity  High data integrity
Ease of Use Very simple to use  complex

Conclusion
For Power BI implementations, the star schema is generally preferred due to its performance benefits, simplicity, and alignment with business reporting needs. However, understanding both schemas allows you to make informed decisions based on specific project requirements, data complexity, and organizational constraints.
Recommendation: Always Start with star schema and only snowflake when specific normalization benefits outweigh the performance costs.
Key Takeaways:
• Star Schema = Performance + Simplicity
• Snowflake Schema = Storage Efficiency + Data Integrity
• Power BI prefers Star Schema for better performance
• Choose based on your specific use case and constraints

Unlocking Agricultural Insights: The Power of BI and DAX in Data Analysis

Kenyansa Felix Amenya — Fri, 10 Oct 2025 09:39:12 +0000

In today's data-driven world, simply having information is not enough; the ability to understand and act on it is what creates a competitive edge. This is where Power BI shines. Power BI is a powerful business analytics tool from Microsoft that transforms raw, disconnected data into interactive and visually compelling reports and dashboards. Its user-friendly interface allows anyone, from analysts to farmers, to connect to various data sources, model the data, and uncover hidden trends with just a few clicks.
Power BI has something called DAX (Data Analysis Expressions). DAX is a library of functions and formulas used to create custom calculations and more complex analysis on your data. While Power BI can show you basic information, DAX allows you to ask deeper, more specific questions of your data.
Let's see different ways DAX functions can be applied using our class example of the Kenya Crops Dataset to derive meaningful agricultural insights.
DAX in Action: Analyzing the Kenya Crops Dataset
1. Mathematical Functions: The Basics of Measurement
These are the foundation of any analysis. Functions like SUM and AVERAGE help us understand scale and central tendency.
• Example: Total_Revenue = SUM('Kenya_crops_Dataset'[Revenue (KES)]) calculates the total income from all crop sales.
• Example: Average_Yield = AVERAGE('Kenya_crops_Dataset'[Yield (kg/ha)]) gives us the typical crop yield per hectare across different regions.
2. Text Functions: Cleaning and Organizing Data
Data is often messy. Text functions help standardize and extract key information.
• Example: County_Code = LEFT('Kenya_crops_Dataset' [County Name], 3) creates a short county code by taking the first three letters of the county's name (e.g., "Nai" for Nairobi).
• Example: Full_Location = CONCATENATE('Kenya_crops_Dataset' [Farm Name], ", ", 'Kenya_crops_Dataset' [County]) combines the farm name and county into a single, readable location string.
3. Date & Time Functions: Tracking Trends Over Time
Agriculture is deeply seasonal. These functions are crucial for time-based analysis.
• Example: Harvest_Year = YEAR('Kenya_crops_Dataset' [Harvest Date]) extracts just the year from a full date, allowing us to compare performance year-over-year.
• Example: Harvest_Day = DAY('Kenya_crops_Dataset' [Harvest Date]).
• Example: Current_date = Today() extracts the current date.
• Example: Revenue_YTD = TOTALYTD(SUM('Kenya_crops_Dataset' [Sales Revenue]), 'Date'[Date]) calculates the total revenue from the start of the year up to the current date in the report.
4. Logical Functions: Building Smarter Calculations
These functions introduce decision-making into your formulas.
• Example: Sum of Yield Crops = CALCULATE(SUM('Kenya_crops_Dataset'[Yield (Kg)])) calculates the Grand Total of all crop yields in the entire dataset.
• Example: averagex_calculated_profit = AVERAGEX('Kenya_Crops_Dataset ', 'Kenya_Crops_Dataset (2)'[Selling price]-'Kenya_Crops_Dataset '[Cost of Production (KES)]) this formula calculates the average profit per transaction/row in our dataset by:

Going row-by-row through the 'Kenya_Crops_Dataset ' table
For each row, calculating: Selling price - Cost of Production (the profit for that specific crop sale/farm)
After calculating profit for every individual row, it takes the average of all these profit values • Example: Averagex selling price = AVERAGEX('Kenya_Crops_Dataset', 'Kenya_Crops_Dataset'[Yield (Kg)]* 'Kenya_Crops_Dataset'[Market Price (KES/Kg)]*'Kenya_Crops_Dataset'[Planted Area (Acres)]). This formula calculates the average "potential revenue per farm" by:
Going row-by-row through each farm record.
For each farm, calculating: Yield (Kg) × Market Price (KES/Kg) × Planted Area (Acres)
After calculating this "potential revenue" for every farm, it takes the average of all these values

Conclusion: From Data to Decisions
By combining the visual power of Power BI with the analytical depth of DAX, businesses and farmers can move beyond simple observation to proactive decision-making. The Agricultural ministry could use these tools to identify which regions are most profitable for specific cash crops, optimize planting schedules based on historical yield data, and create dynamic reports that track key performance indicators in real-time.
The true value of Power BI and DAX lies in their ability to make data useful. They turn abstract numbers into a clear, visual story. For a farmer in Kenya, this story could mean the difference between guessing which crop to plant and knowing with data-backed confidence which one will be most profitable and resilient for the coming season.

Python vs SQL: Which is Best for Querying and Cleaning Data?

Kenyansa Felix Amenya — Mon, 19 May 2025 20:46:16 +0000

When working with data, we are familiar with two tools: SQL and Python. Both are important for data professionals, but they serve different purposes. So let me break down which one should you use for querying and cleaning data.

Querying Data: SQL is the best. reasons is that:
Optimized for Databases – SQL is built specifically for querying structured data in relational databases (PostgreSQL, MySQL, BigQuery, etc.).
Faster Queries – Databases are optimized for SQL, making it much faster than Python for filtering, aggregating, and joining tables.
Simple Syntax – for example when one needs sales data from last month? SQL is direct: SELECT customer_id, SUM(amount) FROM sales WHERE date >= '2024-01-01' GROUP BY customer_id;
Works with big Datasets – SQL databases handle billions of rows efficiently, unlike Python, which struggles with memory.

When to use Python for querying

Unstructured Data – If your data is in JSON, APIs, or web scraped, Python (with requests + pandas) is more flexible.
Advanced Calculations – SQL can do math, but Python (NumPy, SciPy) is better for complex statistics or machine learning prep.

Cleaning Data: Python is More Powerful Why Python is the best for Data Cleaning
More Flexible Transformations – SQL can filter and aggregate, but Python (pandas) excels at:

Handling missing values (df.fillna())

Regex-based text cleaning (df.str.replace())

Complex reshaping (pivot_table, melt)

Custom functions (apply lambda logic easily)

Better for Messy Data – CSV files, Excel sheets, and semi-structured data are easier to clean in Python.
Automation & Reproducibility – Python scripts can clean data the same way every time.

When to use SQL for Cleaning

Basic Filtering & Deduplication – SQL can remove duplicates (DISTINCT), filter rows (WHERE), and simple transformations (CASE WHEN).
Database-Level Cleaning – If your data lives in a database, cleaning it there avoids extra steps.

lets see examples for cleaning in Python vs SQL
Python(pandas):
df['email'] = df['email'].str.lower().str.strip() # Clean emails
df.drop_duplicates(inplace=True) # Remove duplicates
SQL:
UPDATE customers
SET email = LOWER(TRIM(email)); -- Clean emails

DELETE FROM customers
WHERE **row_id **NOT IN (
SELECT MIN(row_id)
FROM customers
GROUP BY email); -- Remove duplicates

3. Performance & Scalability

SQL is faster for querying large datasets (thanks to database optimizations like indexing).
Python (Pandas) can slow down with >1M rows unless you use optimized libraries like Dask or Polars.

Best Practice: Do heavy filtering/aggregation in SQL first, then refine in Python.

In my opinion for the best analyst you must learn to use both
step 1: pull data efficiently with SQL.
Step 2. clean and analyze further with Python.
an example in a workflow:
SQL- **
-- Fast filtering & aggregation
**SELECT user_id, COUNT() as purchases
**FROM* transactions
GROUP BY user_id;

python -
-- Python: Advanced cleaning & visualization
df = pd.read_sql_query("SELECT * FROM clean_data", engine)
df['purchase_category'] = df.apply(lambda x: categorize(x), axis=1)
df.plot(kind='bar') # Visualize

In Conclusion

For querying: SQL is faster and more efficient (especially in databases).
For cleaning: Python is more powerful and flexible.
Best combo: Use SQL first to get the right data, then Python to refine it.

Excel vs. Power BI: A Comprehensive Comparison for Data Analytics

Kenyansa Felix Amenya — Wed, 14 May 2025 21:01:05 +0000

In the world of data analytics, Microsoft offers two powerful tools—Excel and Power BI that cater to different needs and skill levels. While Excel has been the famous tool for decades, Power BI is gaining popularity for advanced data visualization and business intelligence. But which one is right for your needs as Data analyst? Let’s compare them...

1. Data handling and scalability
Excel:

Best for small to medium datasets.
limited performance with very large datasets.
manual data refresh unless using Power query or VBA automation.
*Power BI: *
Works well with big data.
Connects directly to databases SQL, Postgres and etc.
Automatic refresh options with scheduled updates in the Power BI
Service.
looking at this comparability we find that power BI is better for scalability and large datasets.

2. Data transformation and cleaning.
Excel:

Uses Power Query (built-in since Excel 2016) for ETL (Extract,
Transform, Load).
Good for basic cleaning but requires manual steps for complex
transformations.
Formulas (VLOOKUP, INDEX-MATCH) can be cumbersome for large datasets.

Power BI

Also uses Power Query but with more robust data modeling capabilities.
Handles complex transformations better, especially with M language.
Supports relationships between tables (like a relational database).

looking at this comparability we find that power BI is better for advanced data transformation.

3. Data Visualization & Reporting
Excel:

Basic charts (bar, line, pie) and PivotTables for analysis.
Limited interactivity—users must manually filter and drill down.
Dashboards possible but require manual setup and lack real-time
interactivity.

Power BI:

Rich, interactive visualizations with drag-and-drop simplicity.
Custom visuals from the marketplace (e.g., heatmaps, Sankey diagrams).
Drill-through, cross-filtering, and tooltips for deeper insights.
Mobile-friendly dashboards with real-time updates.

looking at this comparability we find that Power BI is dynamic, and is better when used for professional grade reporting.

4. Collaboration and sharing.
Excel:

Files shared via email, OneDrive, or SharePoint.
Version control issues with multiple users editing simultaneously.
Limited real-time collaboration (unless using Excel Online).

Power BI:

Publish reports to Power BI Service for cloud-based sharing.
Role-based access control (RBAC) for security.
Real-time dashboards with automatic refreshes.
Teams integration for seamless collaboration.

looking at this comparability we find that Power BI is better suited for enterprise level sharing and collaboration.

5. Learning Curve & Accessibility
Excel:

Widely used and familiar to most professionals.
Easy for beginners but requires expertise for advanced analytics (PivotTables, Power Pivot, DAX).
No specialized training needed for basic tasks.

Power BI

Steeper learning curve for beginners, especially DAX.
More intuitive for those familiar with Power Query and data modeling.
Free version available, but Pro license needed for full features.

The comparability here shows that Excel is good for beginners while Power BI for those willing to upskill.
In conclusion:
Choose Excel if you work with small datasets , need quick calculations or prefer a familiar tool.
Choose Power BI if you handle huge datasets, need interactive data sets, or work in team environments.
Or rather you can use both if you like...

The power of Keys in SQL

Kenyansa Felix Amenya — Mon, 12 May 2025 18:33:34 +0000

The Power of Keys in SQL: Simplifying Data Analysis

Introduction

In SQL, keys are fundamental to organizing, retrieving, and analyzing data efficiently. They establish relationships between tables, enforce data integrity, and optimize query performance making data analysis faster and more reliable.
This article will explore:

Types of SQL Keys (Primary, Foreign, Composite and etc.)
How keys Improve Data Analysis. Practical examples.
Types of Keys in SQL
A. Primary Key
Uniquely identifies each row in a table.
No duplicates or NULLs allowed.
Automatically indexed, speeding up searches.
Example:
SQL
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
name VARCHAR (100)
);
B. Foreign Key (FK)
Links two tables by referencing a Primary Key.
Ensures referential integrity (prevents orphaned records).
Supports JOIN operations for data analysis.

Example:
SQL
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
emp_id INT,
FOREIGN KEY (emp_id) REFERENCES Employees(emp_id)
);
C. Composite Key
Uses multiple columns as a primary/foreign key.
Useful when a single column isn’t unique enough.
Example:
SQL
CREATE TABLE OrderDetails (
order_id INT,
product_id INT,
PRIMARY KEY (order_id, product_id)
);
D. Unique Key

Ensures uniqueness but allows NULLs (unlike PK).
Helps avoid duplicate data in non-primary columns. Example: SQL CREATE TABLE Users ( user_id INT PRIMARY KEY, email VARCHAR(100) UNIQUE ); 2. How Keys Simplify Data Analysis Faster query performance •Primary and Foreign Keys are indexed, making searches (WHERE, JOIN) faster. Example: SQL -- Quick lookup due to PK index SELECT * FROM Employees WHERE emp_id = 101; Accurate Data relationships (JOINs) •Foreign Keys enable seamless table linking for multi-table analysis. Example: SQL -- Find all orders by employee 'John' SELECT e.name, o.order_id FROM Employees e JOIN Orders o ON e.emp_id = o.emp_id WHERE e.name = 'John'; Ensures data integrity Prevents invalid data (for example no orders for non-existent employees). Example: SQL -- This fails if emp_id 999 doesn’t exist in Employees INSERT INTO Orders (order_id, emp_id) VALUES (5, 999); Simplifies aggregation and reporting Grouping and filtering become efficient with indexed keys. Example:

SQL
-- Count orders per employee (uses PK-FK relationship)
SELECT e.name, COUNT(o.order_id) AS total_orders
FROM Employees e
LEFT JOIN Orders o ON e.emp_id = o.emp_id
GROUP BY e.name;
3. Real-World Data Analysis Example
Scenario: Analyzing Sales Data
Tables:

Customers (customer_id PK, name)
Orders (order_id PK, customer_id FK, order_date)
OrderItems (item_id PK, order_id FK, product_name, quantity)
Query: "Top 5 Customers by Total Purchases"
SQL
SELECT c.name, SUM(oi.quantity) AS total_items
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
JOIN OrderItems oi ON o.order_id = oi.order_id
GROUP BY c.name
ORDER BY total_items DESC
LIMIT 5;
Why it works:
PK-FK relationships ensure correct data linking.
Indexed keys speed up the JOIN and GROUP BY.

Conclusion
SQL keys are essential for:
Maintaining data accuracy (no duplicates, valid references).
Speeding up queries (indexed searches).
Enabling complex analysis (multi-table JOINs, aggregations).
By properly using Primary Keys, Foreign Keys, and Unique Keys, you turn raw data into structured, analyzable information making business intelligence, reporting, and decision-making simpler and faster.