DEV Community: Cyrus Ndungu

Linear Regression

Cyrus Ndungu — Thu, 23 Apr 2026 17:37:32 +0000

What is Linear Regression
Linear regression is a statistical method which models the linear relationship between a dependent variable and one or more dependent variable. It finds the best-fitting straight line through a set of data points.
purpose of Linear regression
The purpose of linear regression is to predict continuous values, understand the relationship between variable and make inference about data. it also measures the strength of relationships.

There are two types of linear regression, one is Simple Linear Regression and the other is Multiple Linear Regression.
Simple vs Multiple Linear Regression

Simple Linear Regression: One independent variable (X) predicting one dependent variable(Y) 2.Multiple Linear Regression: Multiple Independent variable predicts one dependent variable

Below is the Linear Regression Equation.
Y = β₀ + β₁X + ε
where Y is the dependent variable that we are predicting. X is the independent Variable, β₀ is the y intercept, which is the Y axis when X is zero, β₁ is the slope and ε is the error term which is the unexplained variation.

Core Concepts
Linear Regression does not prove Causation it shows correlation. Residuals are the difference between actual values and predicted values. The goal is always minimize the sum of squared residuals.
Least Squares Method is the standard approach to find the best fitting line. It minimizes the sum of squared residuals and provides the line closest to all data points.

Evaluation metrics
R-squared (R²) is a coefficient of Determination ranging from 0 to 1. It is the percentage of variance in Y explained by X.
Root Mean Squared Error (RMSE) measures the average prediction error. The lower values indicate better fit. It is Useful for comparing models.

Power BI and PostgreSQL: Connecting Your Database for Data Analysis

Cyrus Ndungu — Sun, 08 Mar 2026 17:23:01 +0000

In the current world, businesses around the world generate a lot of data daily. The ability to make sense of data in a quick and accurate manner can be the difference between a thriving company and a retrogressing one. This is where Power BI comes in.

Power BI is a business intelligence tool made by Microsoft, primarily used for data visualizations. It allows individuals and organizations to connect to several data sources, transform data into meaningful insights, and present those insights using interactive dashboards and reports. It provides the tools to convert numbers into stories that decision makers can then act on.

The drag-and-drop interface makes the tool accessible to non-techies, while DAX (Data Analysis Expressions) and Power Query give data professionals the flexibility needed for complex transformations and calculations.

Connecting Power BI to a Local PostgreSQL Database

Step 1: Launch Power BI Desktop

Open the Power BI Desktop application on your machine.

Step 2: Select Get Data

On the Home ribbon at the top of the screen, click the Get Data button. This opens a menu that gives you access to Power BI's wide range of supported data connectors.

Step 3: Choose PostgreSQL Database

In the Get Data window, type PostgreSQL in the search bar. Select it and click Connect.

Step 4: Enter the Server Name and Database Name

A dialogue box will appear asking for the server address, which is normally localhost. To specify a particular port, append it to the server name (e.g. localhost:5433).

You will also be required to enter the name of the specific database you want to connect to.

Additionally, you will be presented with the option to choose a Data Connectivity mode:

Import — Loads a copy of the data into Power BI's internal model. It is generally faster, however it requires scheduled refreshes to stay up to date.
DirectQuery — Queries the database in real time whenever a report interaction occurs. This is useful for large datasets or when you need always-current data.

Step 5: Provide Your Credentials

After the above steps, you will be asked to authenticate with the database. Select Database as the credential type, enter your PostgreSQL username and password, then click Connect.

Step 6: Select and Load the Tables

After Step 5, the Navigator window will open, displaying a list of all the tables and views available in your PostgreSQL database. Check the boxes next to the tables you want to bring into Power BI, then load them.

Connecting Power BI to a Cloud PostgreSQL Database (Microsoft Azure)

In the current world, most production environments run on cloud infrastructure. Microsoft Azure Database for PostgreSQL is a fully managed cloud database service offered by Microsoft. To connect to it, the following steps are required.

Step 1: Get Connection Details

Log in to the Azure Portal, navigate to your PostgreSQL resource, and find the following details under Connection Strings or Overview:

Step 2: Configure Firewall Access

Go to Networking in your Azure PostgreSQL resource and add your IP address under Firewall Rules. If you are connecting through the Power BI Service, enable Allow access to Azure services.

Step 3: SSL and Secure Connections

SSL encrypts data travelling between Power BI and the database over the internet, protecting sensitive information from interception. Azure uses a trusted root certificate authority, and Power BI handles this automatically in most cases.

Once these three steps are complete, open Power BI and follow the same steps as before — Get Data → PostgreSQL Database — then enter your Azure host, database name, and credentials to connect.

Data Modelling in Power BI

After connecting and loading your tables, navigate to Model View in Power BI. This view displays your tables as cards and allows you to define how they relate to one another. Power BI can detect some relationships automatically; however, you should review and create any that are missing by dragging a column from one table to the matching column in another table.

Why Does Data Modelling Matter?

When relationships are properly defined, Power BI knows how to filter data across tables automatically, ensuring accurate and consistent results across your reports and dashboards.

Why Are SQL Skills Essential for Power BI Analysis?

Power BI provides a very useful visual interface for building dashboards. However, SQL is the language that communicates with relational databases, and it plays an important role at every stage of the Power BI workflow. Key roles include:

Retrieving the Right Data — When connecting to a database like PostgreSQL, Power BI gives you the option to load entire tables or write a custom SQL query to retrieve only the data you need.
Filtering Datasets Before Loading — The WHERE clause allows analysts to filter data at the database level, reducing the volume of data loaded into Power BI.
Performing Aggregations — This is achieved using functions like SUM, COUNT, MIN, and MAX.
Preparing and Shaping Data — SQL allows analysts to join multiple tables, create derived columns, handle null values, cast data types, and reshape data into the format required by Power BI.

SQL Joins and Window Functions

Cyrus Ndungu — Mon, 02 Mar 2026 16:57:29 +0000

Joins and window functions are two of the most powerful tools in SQL. Understanding them well is really key in the journey of becoming a data professional.
A join combines rows from two or more tables based on a related column. The most common is the INNER JOIN, which returns only rows where a match exists in both tables. If you need to preserve all rows from one side regardless of a match, you use a LEFT or RIGHT JOIN — NULLs fill in where no match is found. A FULL OUTER JOIN goes further, returning all rows from both tables with NULLs on either side where matches are missing. Less common but worth knowing, a CROSS JOIN produces every possible combination of rows between two tables, and a SELF JOIN joins a table to itself — handy for hierarchical data like employee-manager relationships.
Here's a join query combining employees with their departments:

SELECT e.name, d.name AS department, e.salary FROM employees e LEFT JOIN departments d ON e.dept_id = d.id;

Window functions compute values across a set of rows related to the current row — without collapsing the result like GROUP BY does. They use the OVER() clause, with PARTITION BY to define groups and ORDER BY to control row ordering within each group.

Ranking functions like RANK(), DENSE_RANK(), and ROW_NUMBER() are among the most used. The difference is: RANK() skips numbers after a tie, while DENSE_RANK() does not. Aggregate functions like SUM() and AVG() can also be used as window functions. Add an ORDER BY inside OVER() and they become cumulative, perfect for running totals. LAG and LEAD let you look at previous or next row values without a self-join, making period-over-period comparisons simple. Functions like FIRST_VALUE and NTILE round out the toolkit for benchmarking and bucketing data into equal groups.
below is an example of a window function showing each student' score alongside the class average and their rank without losing any rows. Assuming that a list of students and their exam score was initially given.
SELECT name, subject, score, ROUND(AVG(score) OVER (PARTITION BY subject), 2) AS class_average, RANK() OVER (PARTITION BY subject ORDER BY score DESC) AS class_rank FROM student_scores;

How analysts translate messy data, DAX, and dashboards into action using Power BI

Cyrus Ndungu — Mon, 09 Feb 2026 04:34:49 +0000

Power BI is a decision pipeline: it takes messy, siloed inputs and turns them into measurable, operational outcomes. The difference between dashboards that look nice and dashboards that change behavior is an outcomes-first approach, disciplined data work, clear semantic modeling, purposeful DAX, and operational integration.

Start with the decision

Define the decision the dashboard must enable, who will act, and the concrete action expected.
Translate that into one primary KPI, supporting metrics, and clear thresholds tied to actions.

Tame messy data (Power Query)

Profile sources to find nulls, inconsistent types, and outliers.
Clean and standardize fields (dates, categories, numeric types).
Deduplicate and reconcile records; use fuzzy matching where needed.
Make transforms repeatable and traceable with parameters, functions, and source metadata.
Use incremental refresh and early aggregation for large volumes.

Build a trustworthy model

Structure data in a star-schema: facts for events/transactions and dimensions for entities.
Provide a dedicated Date table and mark it.
Prefer single-direction relationships and reduce unnecessary cardinality.
Remove unused columns and maintain clear, documented relationships.

Use DAX for business logic (measures over columns)

Encapsulate dynamic, context-sensitive calculations as measures so results respond correctly to filters and visuals.
Keep DAX readable and performant: use variables, avoid needless row-by-row iteration, and handle edge cases (e.g., divisions by zero).
Document complex measures so maintainers and stakeholders understand the logic.

Design dashboards to prompt action

Lead with a one-page decision view: the KPI, its trend, and the top drivers.
Make next steps explicit: display the owner, required action, and conditional highlights tied to thresholds.
Enable quick drill paths and focused views for investigation without overwhelming the top-level page.

Operationalize insights

Connect dashboards to workflows: alerts, subscriptions, and Power Automate flows that create tickets, notify teams, or update trackers.
Surface accountability: owners, status, and an action log on or linked from the dashboard.
Ensure viewers can quickly move from insight to a recorded action.

Performance, governance, and quality

Optimize model size and query performance by removing unused fields, using aggregates, and tuning DAX.
Apply row-level security, document data sources and transformations, and use deployment pipelines or version control for PBIX assets.
Test ETL and measures with representative data and regression checks after changes.

Measure impact and iterate

Track usage and business outcomes: who uses the dashboard, what actions were taken, and whether KPIs moved.
Treat dashboards as products: collect feedback, prioritize improvements, and release updates with measurable goals.

Wrap-up

Turning messy data into action with Power BI requires technical rigor and operational design. Clean, auditable ETL; a clear semantic model; robust, documented DAX; and dashboards built around decisions — connected to workflows and ownership — are the levers that move teams from insight to impact.

A pianist’s take on Power BI: Schemas & data modelling made musical 🎹

Cyrus Ndungu — Sun, 01 Feb 2026 17:55:49 +0000

Hi — I’m someone who spends more than a little time at the keyboard. When I arrange a tune I think about structure (intro, verse, chorus, bridge) and how the parts fit together so the melody breathes. Data modelling in Power BI is the same kind of craft: if the foundation is good, the report performs and the insights sing. Below I’ll walk you through schemas, fact & dimension tables, relationships, and why good modelling matters — in plain, friendly language with practical tips you can use right away.

What you’ll get from this article

Clear definitions of fact tables, dimension tables, star and snowflake schemas
How relationships work in Power BI (direction, cardinality, many-to-many)
Why modelling affects performance and correctness (real-world examples)
A practical, step-by-step recipe to design a clean Power BI model
Quick checklist and troubleshooting tips

Facts and dimensions — the melody and the harmony

Think of a report like a song:

The fact table is the melody — the events you measure (sales, clicks, shipments).
The dimension tables are the harmonies — the context (dates, customers, products, regions).

Example: a simple sales model

FactSales (the melody)

OrderID, OrderLineID, DateKey, CustomerKey, ProductKey, Quantity, Revenue

DimDate (harmony)

DateKey, FullDate, Month, Quarter, Year

DimCustomer (harmony)

CustomerKey, CustomerName, Segment, Region

DimProduct (harmony)

ProductKey, ProductName, Category, Brand

Key characteristics:

Fact = many rows, numeric measures, foreign keys to dims.
Dimension = relatively few rows, descriptive attributes, primary key.

When fact and dims are aligned by consistent keys and grain, queries are simple and correct.

Star schema — the classic pop song (simple & fast)

A star schema has one central fact table with dimension tables radiating out. It’s the most common and recommended pattern for Power BI.

Visual (ASCII):

DimDate
         |
DimCustomer — FactSales — DimProduct
         |
      DimRegion

Why star schema works well in Power BI

Fewer joins → faster queries in VertiPaq (the in-memory engine).
DAX measures are simpler because relationships are straightforward.
Good for aggregation (rollups by date, product, customer).
Easy for report consumers to understand.

When to use it: Most analytics and reporting scenarios where speed and simplicity matter.

Snowflake schema — the classical piece (normalized)

A snowflake schema normalizes dimensions into multiple tables. Example: DimProduct → DimCategory → DimSubcategory.

Why you might choose snowflake:

Less redundancy; easier to maintain when attributes change frequently across many items.
Smaller dimension tables (sometimes saving storage).

Why it can slow things down in Power BI:

More joins increase query complexity and can slow VertiPaq queries.
DAX can become more complex when traversing normalized hierarchies.

Rule of thumb: prefer star for analytical models in Power BI. Use snowflake only when normalization gives clear maintenance or governance benefits.

Relationships — the chord progressions of your model

Relationships tell Power BI how tables connect. Important concepts:

Cardinality: One-to-many (1:), many-to-one (:1), many-to-many (:). Most common is 1:* (dimension → fact).
Cross-filter direction: single or both. Single is safer and faster; both (bidirectional) can be convenient but may introduce ambiguity and performance issues.
Active vs inactive relationships: only active relationships filter by default. USERELATIONSHIP in DAX can activate an inactive relationship in a calculation.
Relationship keys: use surrogate numeric keys (integers) for best performance; avoid text-based keys for relationships if possible.

Examples (DAX)

Basic measure

Total Revenue = SUM(FactSales[Revenue])
Use an alternate date relationship

Total Sales by Ship Date =
CALCULATE(
[Total Revenue],
USERELATIONSHIP(DimDate[DateKey], FactSales[ShipDateKey])
)

Many-to-many: use a bridge table or composite model to avoid ambiguous filters and double counting.

Grain matters — set the right level for facts

The “grain” of a fact table defines what a single row represents:

Order line (one row per SKU per order)
Order header (one row per order)
Daily aggregated sales (one row per product per day)

If granularity is inconsistent across tables or measures, you’ll get wrong numbers (double counts, weird averages). Always:

Decide the grain early.
Keep the fact table at the lowest necessary grain for your reports.
Use aggregated tables for faster summary reports if needed.

Performance — why modelling makes or breaks speed

Power BI uses VertiPaq: a columnar, in-memory engine with dictionary encoding and compression. Good modelling optimizes those internals.

Practical performance rules

Remove unnecessary columns (they increase memory).
Prefer numeric surrogate keys — smaller dictionaries and faster joins.
Reduce cardinality where possible (high-cardinality columns are expensive).
Use star schema so queries join fewer tables.
Keep dimension attributes that you use in visuals; move rarely used attributes to a separate table.
Use Import mode for best performance; DirectQuery has runtime dependency on source and limits optimization.
Use incremental refresh for large fact tables.

Advanced tools

Aggregations: create pre-aggregated summary tables for high-level reports and let Power BI route queries to them.
Composite models & Dual storage mode: combine Import and DirectQuery, use dual to optimize lookup tables.
VertiPaq Analyzer or Power BI Performance Analyzer to find bottlenecks.

Concrete benefit: a well-designed star model can reduce dataset size drastically and cut query times from minutes to seconds.

Accuracy — avoid the false harmonies

Bad modelling doesn’t just slow you down — it misleads.

Common accuracy pitfalls

Duplicate or inconsistent dimension keys (e.g., “John Smith” vs “john smith”) → wrong joins and inflated counts.
Mixing granularities in measures (summing order lines then counting orders without DISTINCT) → double counts.
Using many-to-many without careful bridging → incorrect aggregations.
Relying on bi-directional filters to “fix” an issue — it may mask poor model design.

How to validate

Do spot checks: compare totals in your fact table vs a simple SUM in the model.
Verify DISTINCTCOUNT(OrderID) between source and model.
Use small test measures to assert expected behavior before building complex visuals.

Practical recipe: build a clean Power BI model (step-by-step)

In Power Query: clean & shape

- Remove unused columns.

- Standardize keys and text (trim, clean, proper case).

- Ensure dates are real date types.

- Aggregate if you can reduce grain safely.

- Create surrogate keys if necessary (e.g., ProductKey).

Create a Date table (essential)

- Generate a full date dimension with Year, Quarter, Month, Fiscal columns.

- Mark it as Date table in Model view (Modeling → Mark as Date table).

Load fact(s) and dimension(s)

- FactSales should have foreign keys to dims.

- Ensure keys are the right data type (whole number for IDs).

In Model view:

- Create one-to-many relationships from dimension → fact.

- Set cross-filter to single direction unless you have a specific reason.

- Hide technical key columns from report view.

Create measures (not calculated columns) where possible

- Measures calculate on the fly and are memory efficient.


Total Revenue = SUM(FactSales[Revenue])
Orders = DISTINCTCOUNT(FactSales[OrderID])
Average Order Value = DIVIDE([Total Revenue], [Orders])

Test for correctness

- Compare totals to source extracts.

- Validate a few sample customers/products/dates.

Optimize

- Remove unused columns/tables.

- Consider aggregations for very large datasets.

- Use incremental refresh for historical fact data.

Handling special cases

Slowly Changing Dimensions (SCD)

Type 1: overwrite attributes (current view only)
Type 2: store history with row-effective dates or version keys — useful when you need historical reporting at the same grain as facts.

Role-playing dimensions

Date can be order date, ship date, invoice date. Use separate foreign keys in fact and USERELATIONSHIP for alternate measures.

Many-to-many

Use a bridge (junction) table or composite model; avoid ad-hoc bidirectional relationships.

DirectQuery & Composite models

DirectQuery keeps data at source (good for real-time but slower).
Composite models allow mixing Import and DirectQuery to get the best of both worlds.

Quick checklist — tuneup before publishing

Grain of fact table defined and documented
Star schema (or justified snowflake) in place
Date table present and marked as such
Relationships 1:* with single direction (unless required)
Numeric surrogate keys used for joins
Unused columns removed & hidden from report view
Measures created for aggregation (not unnecessary calculated columns)
Test totals match source system for several samples
Performance validated (Performance Analyzer)

Troubleshooting common issues (quick tips)

Wrong totals? Check relationship direction and active relationships.
Duplicate counts? Check grain and use DISTINCTCOUNT.
Slow visuals? Remove high-cardinality columns from visuals, consider aggregation.
Many-to-many confusion? Introduce a bridge table and use measures carefully.

Final chord — why this matters

Good modelling is the sheet music for your data. When you model well:

Reports are fast and responsive (your audience stays engaged).
Numbers are correct and trustworthy (your stakeholders have confidence).
DAX stays readable and maintainable (you can iterate quickly).
Future changes are easier — like modulating into a new key without breaking the song.

Start simple: build a clean star schema, treat the date table as sacred, use measures, and optimize only where you need to. As a keyboard player, I know how freeing it feels to have the skeleton of a good chord progression — you can improvise wonders on top. The same is true for your data: solid structure unlocks creativity.

A Guide to Git and GitHub for Data Analysts

Cyrus Ndungu — Sat, 17 Jan 2026 08:59:54 +0000

A Guide to Git and GitHub for Data Analysts

In the world of software engineering, writing code is only half the battle. The other half is managing that code—tracking its evolution, collaborating with others, and preventing data loss which might be catastrophic. This is where Version Control comes in.

1. What is Git and Why Version Control Matters

Version Control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

Git is a Distributed Version Control System (DVCS). Unlike a central server where files are locked, every developer's computer has a full copy of the code history.

Why is this important?

The "Undo" Button: If you break your code at 2:00 AM, you can instantly revert the project to the state it was in at 10:00 PM. isn't this exciting!
Collaboration: Multiple data analysts can work on the same file simultaneously. Git uses mathematical algorithms to merge(combine) these changes together.
Branching: You can create parallel universes (branches) to test crazy ideas without breaking the main working code.
Context: It tells you who wrote a line of code, when, and importantly, why (via commit messages).

Note on Git vs. GitHub:

Git is the tool (the software installed on your machine).

GitHub is the service (a website that hosts Git repositories in the cloud). Think of it as: Git is MP3, GitHub is Spotify.

2. How to Track Changes (The Git Workflow)

Tracking changes in Git follows a three-stage process. Imagine you are packing a moving truck:

Working Directory: Where you edit files.
Staging Area (Index): Where you choose what to save.
Repository (HEAD): A cloud storage for your code.

The Commands

First, initialize Git in your project folder:

git init

Check the status of your files (your "dashboard"):

git status

Step A: Staging

Move changes from the Working Directory to the Staging Area.

# Add a specific file
git add main.py

# OR add all changed files in the current directory
git add .

Step B: Committing

Seal the snapshot. This creates a permanent record in the history graph (a node in the tree).

git commit -m "Implement the quadratic formula function"

The -m flag allows you to write a message.
Best Practice: Write messages in the imperative mood (e.g., "Add feature" not "Added feature").

3. How to Push Code to GitHub

"Pushing" is the act of uploading your local repository history to a remote server (GitHub).

Prerequisite: Create a new empty repository on GitHub.com.

Step A: Connect Local to Remote

You need to tell your local Git where the GitHub server is. We usually name the remote server origin.

git remote add origin https://github.com/cyrusz55/my-project.git

Step B: Push the Code

Send your committed changes up to GitHub.

git push -u origin main

origin: The destination (GitHub).
main: The branch you are sending (standard naming used to be master, now it is main).
-u: Sets the "upstream." After doing this once, you can simply type git push in the future.

4. How to Pull Code from GitHub

"Pulling" is downloading data from GitHub to your computer. There are two scenarios for this.

Scenario A: Starting from scratch (`git clone`)

If you are on a new computer or joining a new project, you need to download the entire repository history.

git clone https://github.com/cyrusz55/my-project.git

This command does git init, creates the remote link, and downloads the data all in one go.

Scenario B: Updating existing code (`git pull`)

If you already have the folder, but your teammate pushed new code (or you pushed code from a different computer), you need to update your current setup.

git pull origin main

This fetches the new changes and immediately merges them into your local files.

Summary Cheatsheet

Goal	Command
Start Git	`git init`
Check status	`git status`
Stage files	`git add .`
Save snapshot	`git commit -m "message"`
Download repo	`git clone <url>`
Upload changes	`git push`
Update local	`git pull`

Happy coding! 🚀

How Excel is Used in Real-World Data Analysis

Cyrus Ndungu — Thu, 12 Jun 2025 21:35:43 +0000

My First Week Exploring Excel: Turning Numbers into Insight

Hello, my name is Cyrus Ndung'u. Over the past week, I’ve been immersing myself in the vast and fascinating world of data—specifically Microsoft Excel. The experience has been exciting and deeply engaging. Even the few challenges I encountered made the journey more interesting, because each obstacle pushed me to learn something new and rewarding.

Introduction to Excel

As a pianist and music lover, part of my responsibility is transforming scattered notes into a harmonious symphony. In a similar way, Excel helps transform raw numbers into meaningful insights that drive real-world decisions.

Microsoft Excel is a powerful spreadsheet application and a cornerstone of data analysis across many industries. It enables users to:

input and organize data clearly,
perform calculations using formulas and functions,
visualize information with charts and formatting,
and analyze trends to support better decision-making.

Excel blends structure and creativity—allowing analysts to explore data with both precision and imagination.

Real-World Applications of Excel in Data Analysis

Excel is widely used because it can support real, practical work in many fields. Here are a few common examples:

1) Business Decision-Making

Organizations rely on Excel to analyze sales trends, track performance metrics, and forecast growth. Managers can compare results across departments, identify patterns, and make strategic decisions that influence overall success.

2) Financial Reporting and Analysis

Financial professionals use Excel for budgeting, financial modeling, and reporting to stakeholders. It supports tasks such as analyzing investment portfolios, calculating ratios, and producing detailed summaries that guide major business and investment choices.

3) Marketing Performance Tracking

Marketing teams use Excel to evaluate campaign effectiveness through metrics like conversion rates, customer acquisition cost, and return on investment (ROI). With structured tracking, teams can improve strategies by learning what works—and what doesn’t.

Essential Excel Features for Data Analysis

Excel offers many features that strengthen analytical work. Three that stand out are:

VLOOKUP

VLOOKUP helps search for a value in a table and return related information from another column. It is especially useful when combining data from multiple sheets or retrieving specific records quickly.

Pivot Tables

Pivot tables are powerful summarization tools that help analysts reorganize, group, and filter large datasets. They make it easier to view data from multiple perspectives and uncover trends that aren’t obvious in raw tables.

Conditional Formatting

Conditional formatting uses visual cues—such as colors, icons, and data bars—to highlight patterns, trends, and outliers. It helps important insights stand out immediately, making analysis faster and clearer.

Personal Reflection

Learning Excel has changed how I see data. Where I once saw overwhelming spreadsheets full of numbers, I now see stories waiting to be told. I’ve realized that effective data analysis is more than technical skill—it also requires creative problem-solving.

In many ways, data feels like music: it has rhythm, patterns, and relationships. Just as musical compositions follow structure and harmony, datasets also contain patterns that can be discovered and interpreted. This connection has made learning Excel more intuitive and enjoyable for me.

Turning raw information into meaningful insights feels remarkably similar to creating music. Both require:

understanding underlying structure,
recognizing patterns,
and presenting results in a way that resonates with an audience.

This perspective has strengthened my curiosity and motivation to keep learning—especially to explore more advanced Excel features. This journey has taught me that data analysis isn’t just about formulas and numbers; it’s about discovering what the data is saying and using those insights to make better decisions in a data-driven world.

DEV Community: Cyrus Ndungu

Linear Regression

Power BI and PostgreSQL: Connecting Your Database for Data Analysis

Connecting Power BI to a Local PostgreSQL Database

Step 1: Launch Power BI Desktop

Step 2: Select Get Data

Step 3: Choose PostgreSQL Database

Step 4: Enter the Server Name and Database Name

Step 5: Provide Your Credentials

Step 6: Select and Load the Tables

Connecting Power BI to a Cloud PostgreSQL Database (Microsoft Azure)

Step 1: Get Connection Details

Step 2: Configure Firewall Access

Step 3: SSL and Secure Connections

Data Modelling in Power BI

Why Does Data Modelling Matter?

Why Are SQL Skills Essential for Power BI Analysis?

SQL Joins and Window Functions

How analysts translate messy data, DAX, and dashboards into action using Power BI

A pianist’s take on Power BI: Schemas & data modelling made musical 🎹

A Guide to Git and GitHub for Data Analysts

A Guide to Git and GitHub for Data Analysts

1. What is Git and Why Version Control Matters

Why is this important?

2. How to Track Changes (The Git Workflow)

The Commands

3. How to Push Code to GitHub

Step A: Connect Local to Remote

Step B: Push the Code

4. How to Pull Code from GitHub

Scenario A: Starting from scratch (git clone)

Scenario B: Updating existing code (git pull)

Summary Cheatsheet

How Excel is Used in Real-World Data Analysis

My First Week Exploring Excel: Turning Numbers into Insight

Introduction to Excel

Real-World Applications of Excel in Data Analysis

1) Business Decision-Making

2) Financial Reporting and Analysis

3) Marketing Performance Tracking

Essential Excel Features for Data Analysis

VLOOKUP

Pivot Tables

Conditional Formatting

Personal Reflection

Scenario A: Starting from scratch (`git clone`)

Scenario B: Updating existing code (`git pull`)