DEV Community: GRACE MUTHONI MWANGI

Mastering joins and window functions on PostgreSQL

GRACE MUTHONI MWANGI — Mon, 02 Mar 2026 03:12:39 +0000

JOINS

Joins are SQL clauses used to combine rows from two or more tables (or views) based on a related column.

There are numerous join types in PostgreSQL. The way to determine the best join to select mainly depends on the following:

The output that you would like displayed from the tables that are joined, including the unmatched records.
The relationship between the tables and/or views, including the common columns.

Data tables used for illustration purposes are:

Types of joins on PostgreSQL:

Cross join - This join combines every row from one table with every row of the other table.

For example;

select p.project_name, e.salary, e.name, e.employee_id from sales_data.projects p cross join sales_data.employees e ;

This tends to create a multiplication of the results based on the row counts of the 2 tables.
The 2 tables each have 5 rows. The end result generated by the code has 25 rows.
This join type doesn't require the on statement that matches the common column between the 2 tables; many term it as a non-matching join.

Inner join - This join typically returns the matching values from both tables. This simply means that if there is no match in either table, then the row is excluded from the results.

For example;
select emp.name, dep.department_id, dep.department_name from sales_data.employees emp inner join sales_data.departments dep on emp.department_id = dep.department_id;

The employee called Eve is not yet assigned to any department; therefore, her name is excluded from the results.
Additionally, the Finance department doesn't have an employee linked to it and is therefore also not displayed in the query's result.

Left outer join - This join, also referred to as a left join, typically returns all the rows from the left table and the matching values from the right table (mentioned 2nd). Rows from this table appear based on the results of the table mentioned 1st, which is on the left. For example:

select * from sales_data.projects p left outer join sales_data.employees e on p.employee_id = e.employee_id;

select * from sales_data.projects p left join sales_data.employees e on p.employee_id = e.employee_id;

The left outer join from this code returns all the rows from the project table, including the projects that aren't assigned to any particular employee.
Additionally, not all employees appear in the query result because some employees don't have projects assigned to them yet.

Right outer join - This join, also referred to as a 'right join', typically returns all rows on the right (table mentioned last) and the matching values from the left table. For example:

select * from sales_data.projects p right outer join sales_data.employees e on p.employee_id = e.employee_id;

select * from sales_data.projects p right join sales_data.employees e on p.employee_id = e.employee_id;

The right outer join from this code returns all the rows from the Employees table positioned on the right.
This simply means that all employees are listed and the projects they are assigned to; however, not all projects have been captured, as they aren't assigned to any employee yet.

Full outer join - This join, also referred to as a full join, typically returns all the rows from the tables in the query. The values of the first table mentioned typically appear on the left. For example:

select * from sales_data.projects p full outer join sales_data.employees e on p.employee_id = e.employee_id;

select * from sales_data.projects p full join sales_data.employees e on p.employee_id = e.employee_id;

This code returns all the rows from the project table first, and then extra rows containing a record of the employees who aren't yet assigned to projects yet.

Natural join - This join, only matching values from the tables referenced. It automaticcally ensures tables are joined on the basis of the primary key and the foreign key of the 2nd table. The only limitation is that these columns have to share a name. For example:

select * from sales_data.employees e natural join sales_data.projects p;

Enriching join queries

You can additionally enrich the joins by adding different functions. For example;

WHERE clause:

Using the WHERE clause to filter values from the tables as shown below:

select * from sales_data.projects p full join sales_data.employees e on p.employee_id = e.employee_id where e.employee_id <4;

In the query above, the where clause filters by the employee_id column, only displaying rows containing employee_ids that are lower than 4.
It is importan to note that the where clause only filters rows; it cannot work on an aggregate column.
If the column is an aggregrate then it is best to use "Having" clause e.g

select c.first_name, c.last_name, SUM(s.quantity_sold) as units_sold from assignment.customers c inner join assignment.sales s on c.customer_id = s.customer_id group by s.customer_id,c.first_name, c.last_name having SUM(s.quantity_sold) > 5;

ORDER BY clause

This clause helps to sort the data within the query, therefore returning results that are in ascending/descending order.
This clause is best when returning numeric values, and you can order the values to see the highest or the lowest.
Additionally, you can order text values to have the values ordered alphabetically, like in the example below:

select * from sales_data.projects p full join sales_data.employees e on p.employee_id = e.employee_id order by e.employee_id asc nulls last, p.project_name asc;

Window Functions

These are functions that perform a set of calculations across a set of rows related to the current row in the window (query result displayed). These window functions are further divided by their data-related functionality. Typically, these functions introduce a new column to the displayed query results.

The window functions are differentiated from other normal functions using the 'over()' clause that instructs PostgreSQL to not collapse rows like 'GROUP BY' would, instead, calculate this function by row.

Ranking Functions on PostgreSQL:

These functions help to assign ranks or row numbers to the displayed results within a partition.

Examples of Ranking Functions:

Row_number – this function gives a unique number to each row. The numbering system is sequential and has no gaps or duplicates. select *, row_number() over() as id_column from sales_data.working_hub;
_Rank _– this function assigns numbers, but with gaps after there is a tie in values referenced. Typically, this will have duplicates, in the numbers displayed for the duplicate values (the ties) select *, rank () over() as id_column from sales_data.working_hub; The code above typically returns all the rows numbered as 1 because the order by is missing(undefined).

select *, rank () over(order by salary desc nulls first) as id_column from sales_data.working_hub;

This code, on the other hand, clearly depicts the working of the rank function. The salary column is used to number the query results.
We have 2 NULL values at the top because of the code: nulls first. These are both numbered as 1, and then the next value is numbered as 3.
The highest value of the column_id that is generated is 7

Dense rank - works like rank but without gaps. Meaning the same value will still be displayed where there is a tie, but the next value will be sequential after the previously assigned number.

select *, dense_rank() over (order by salary desc nulls first) as id_colum from sales_data.working_hub;

The highest value of the column id generated is 6. We have 2 NULL values at the top (nulls first) these are both numbered as 1, and then the next value is numbered as 2.

-_ Ntile_ - works as a quartile function would on MS Excel, equally dividing a certain class in the specified number of times

Aggregate Functions on PostgreSQL:

These are the normal aggregate functions with the 'over' clause, which helps run the functions without grouping the rows.

select *, SUM (e.salary) over () as total_salary from sales_data.projects p full join sales_data.employees e on p.employee_id = e.employee_id order by e.employee_id asc nulls last, p.project_name asc;

There is a new column that is created from the as clause called total_salary.
The contents of the additional column repeat the total salary amount on every row, returning 242k across all rows in the result.
Needless to say, other aggregate functions can also be used, as in the example given above.

Value Navigation Functions on PostgreSQL:

These functions help to access other rows within the result in the window, therefore helping in comparisons based on the data returned in the column.

Lag function - This returns data from the previous cell by the specified column to the current row. For example:

select *, lag(salary) over (order by salary desc nulls first) as previous_data from sales_data.working_hub;

Lead function - This returns data from the cell below (next cell), acting as a relative cell reference. For example:

select *, lead(salary) over (order by salary desc nulls first) as pervious_data from sales_data.working_hub;

First Value - This returns data from the first value displayed in the window, while Last Value returns data from the last cell displayed in the window.

Enriching window functions

Partition by divides the result set into logical groups based on the specified column while preserving individual rows. Unlike a GROUP BY clause, it does not collapse rows but instead allows the window function to perform calculations within each partition. For example:

select *, SUM (e.salary) over (partition by project_name) as total_salary from sales_data.projects p full join sales_data.employees e on p.employee_id = e.employee_id order by e.employee_id asc nulls last, p.project_name asc;

The contents are returned as would a SUMIF function if it were performed on Excel. So, the sum range is the salary column, the criteria range is the project_name column, and the criteria is the exact department displayed on that specific row.

Additional resources

PostgreSQL joins
PostgreSQL joins 2
PostgreSQL Window functions

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI

GRACE MUTHONI MWANGI — Mon, 09 Feb 2026 22:56:10 +0000

Messy data no longer has to be scary! Think of it as a blessing in disguise; during the clean-up of the data, you get opportunities to interact with and understand the data. Therefore, you can easily come up with suitable dimension tables and fact tables from your data, which is helpful during modelling.

Getting data on Power BI

The first course of action will always be loading your data to Power BI. Power BI has over 150 default connectors, i.e., files (CSV/Text, MS Excel, JSON, etc.), databases (SQL, PostgreSQL, MySQL, Oracle, etc.), online services, Azure and many more.
Needless to say, the fact that Power BI can access data from multiple connectors means it could get frustrating for a beginner to load new data, especially if they are not sure of the data source and/or type.

Best practice when adding data

Open Power BI > Blank Report
Select 'Import data from Excel' > Change file type to 'all files', as in the image below
Click open > Select data tables that you want to see on your report > Load/Transform data

Getting additional data from other files/sources:

When analysing data from different files and sources, it is easy to add the second dataset, as there is no provision to load both datasets at once.

On the same report containing the data that was loaded on Power BI, you will need to:

On the Home Tab > Get data > More > Select 'All' > Connect
Proceed to follow the rest of the steps as if adding new data

Transforming data

Transforming data is the term used to indicate cleaning messy data on Power BI. Once you load new datasets on Power BI, it will automatically identify and categorize the uploaded data, like in the image below:

Disclaimer: The newly formatted data is not always in a format suitable for analysis. Therefore, one will always need to modify it and these modifications can only be done using the application known as *Power Query Editor *

Power Query Editor

You can initiate the Power Query Editor window while on either Report, Model or Table views on Power BI. On either view, navigate to the Home tab > Transform data

You get a pop-up window like shown below containing the data tables uploaded earlier

Power Query Editor Ribbon

The Ribbon has multiple tabs that will help with commands used for transforming data:

File Tab - Contains commands that are related to managing the working file on the Power Query Environment
Home Tab - Contains commands used in data preparation and organization, such as merging queries, changing data types, etc.
Transform Tab - Contains commands which help modify existing columns, such as grouping, pivoting, splitting and merging texts and columns, etc.
Add Column Tab - Contains commands for creating new columns derived from existing data columns
View Tab - Contains the commands controlling the visibility of panes and profiling tools to validate and inspect data quality
Tools Tab - Contains commands which offer diagnostic help options for troubleshooting queries
Help Tab - Provides access to documentation, learning resources, and support for Power Query

# Transforming messy data on Power BI

The importance of data transformation is:

Data cleaning and quality: Data transformation identifies and resolves errors, missing values, and inconsistencies, which increases the trustworthiness of reports.
Structuring for analysis: Transformations help format data correctly (e.g., changing data types, pivoting/unpivoting) so it can be effectively used in visualisations.
Performance optimization: Unnecessary columns or rows are removed, reducing data volume and enhancing report load speeds.
Automations: Repetitive, manual data cleaning tasks in Excel can be automated using Power Query, ensuring consistent, repeatable processes.
Data Enrichment: Creates new calculated columns, splits/merges columns, and combines data from multiple sources to create a unified data model.

DAX (DAX Guide)

DAX stands for data analysis expressions, which is the formula language used in Power BI. Just like in excel, we have different formulas used for different types of data.

Text functions are used to manipulate strings. They can also be done during data transformation, as most of the text functions are available when you right-click on the text column that one wishes to modify
Logical functions assess logical expressions, and return information about the values or sets in the expression i.e If, Or, And, Switch
Aggregation functions return values by applying an aggregation function to a column or to an expression e.g Sum and Sumx, Average and Averagex
Date and time functions create calculations based on dates and time. Many of the functions in DAX are similar to the Excel date and time functions e.g DateDiff, Datevalue etc
Filter functions help manipulate tables and filter contexts.
Time Intelligence functions are calculations used to compare and aggregate data over time periods, supporting days, months, quarters, and years e.g Endofmonth, Endofyear

Dashboards

Dashboards are visualizations that help explain data to non-technical users. Different visuals are organized in one view and clearly show the insights that are gotten from data

Features of a good dashboard

Important KPIs should be easy to see
Layout well organized - KPIs at the top and charts within
Should be one page
Should be easy to understand and interactive
Should be able to support easy decision-making
Visuals included should not have generic headings; headings ought to be clear, concise and easily understandable.
Slicers within the dashbard should not be broken; they should be connected to all visuals within the dashboard

# Visualization

When creating a dashboard using the report view, you are required to plot your dashboard on the canvas section of the page labelled A. The various visualizations are available on the highlighted area

One can make various adjustments based on the visualization selected in terms of plotting, formatting and layout on the generated.

Different visualizations:

Line chart - uses lines to visualize trends
Area charts - have the areas filled and show magnitude
Stacked column chart - show both categories' and subcategories' contributions to the total. They have the sum of values of subgroups.
Stacked bar charts - same as the stacked column chart but plotted horizontally
Pie charts - shows proportions of data categories
Donut chart - the same as a pie chart but has a hole in the middle
Treemaps - show hierarchy relationships of the plotted data
CAD Visual – shows aggregated values of 1 entry specifically KPIs. One can also use the Multicad
Bubble map – these are geographical maps with bubbles
Filled map – coloured geographical regions
Scatter charts show correlation between 2 variables. Help identify outliers in data
Funnel charts - show step-by-step flow in data (sequence)
Waterfall charts - visualizes cumulative data & captures outliers
Combo charts - combines the column and bar charts
Bar charts - compare or track changes in data using horizontal bars
Tables - these are used when plotting data as exact values with categories on the rows and summations on the columns
Matrix Tables - these are used when plotting data as exact values with 2 sets of categories, one on the rows, another on the column and summations for each segment
Column charts - compare or track changes in data using vertical bars
Slicers - help bring interactivity within the dashboard based on categories

Understanding schemas and data modelling in Power BI

GRACE MUTHONI MWANGI — Mon, 02 Feb 2026 01:05:22 +0000

Power BI is a vital tool when it comes to connecting various data sets from different sources, transforming that data, and creating interactive, visual reports and dashboards.

Many beginners wonder about the difference between schemas, and data models; schemas are specific designs that conceptualize how to structure one's data for analysis while data models are implementations of the selected schemas. This indicates that data modelling is dependent on the schema that is created on Power BI.

Common terminologies when dealing with schemas and data modelling:

Fact table - stores key business data such sales or transactional data which changes regularly based on adjustments and projections

A fact table will normally contain dimension key columns that relate to dimension tables, and numeric measure columns, these allows summarization

Dimension table stores the additional information related to the transactional data (lookup/descriptive data) and is mostly constant

A dimension table contains a key column (or columns) that acts as a unique identifier and other columns which support filtering and grouping your data.

Normalized data describes data that's stored in a way that reduces repetitious data
Denormalized data describes data that is stored in a way that has repetitious data in rows

1. Schema Design (The Planning Stage)

Before bringing data into Power BI, you must determine how to structure it for analysis. A star schema is the most preferred approach, where you organize data into a central fact table surrounded by dimension tables.

A well-planned schema ensures better performance, easier DAX, and more accurate reports.

Actions to execute: Deciding which tables are facts and which are dimensions and identifying relationships between the two.

Types of schemas:

1. Star Schema

In a star schema, a fact table is surrounded by multiple dimension tables. Power BI engine works best with star schema, where the fact table will always be in the middle, whilst the rest of the dimensional tables will be surrounding the fact table

2. Snowflake schema
A snowflake schema has the same layout as the star schema however, it extends further than it with some or all of the dimension tables are further divided into sub-dimension tables

2. Data Modeling (The Implementation Stage)

Data modeling is the actual, hands-on process in Power BI where you apply the schema design.

It involves importing, cleansing in Power Query, creating tables, and establishing relationships (e.g., one-to-many) in the Modeling view.
What you are doing: Implementing the star schema using Power Query and the Relationship View.

Semantic models:
A semantic model consists of all connected data, transformations, relationships, and calculations. To follow the flow of Power BI, you first connect to data, transform data, and create relationships and calculations to create a semantic model.

Relationships in PowerBI

Relations in PowerBI are connections between tables based on common columns. These connections enable data from multiple sources to be used in a single, accurate report. The relationships also ensure that slicers and other visualization tools correctly reflect on other data tables.

Common types of relationships within Power BI include:

One-to-Many (1:M) / Many-to-One (M:1): The most common, ideal relationship where one dimension table links to multiple rows in a fact table (e.g., Product table to Sales table).
One-to-One (1:1): Each record in table A matches exactly one record in table B; rare, often suggesting tables should be merged.
Many-to-Many (M:M): Multiple rows in one table match multiple rows in another; used when direct relationships are not possible, though often requiring bridge tables.

Creating relationships on PowerBI:

Using the auto-detect feature on PowerBI
When you load data containing multiple tables on PowerBI, it automatically attempts to find and create relationships for you. These relationships are mainly determined by the names of the columns within your data tables. To automatically detect this on PowerBI, on the Modeling tab, select Manage relationships > Autodetect
Manually creating a relationship
This is feasible in instances where PowerBI is unable to automatically detect relationships between tables especially when the table columns have different names.

To manually create a connection;

On the Modeling tab, select Manage relationships > New.
In the Create relationship dialog box, in the first table drop-down list, select a table. Select the column you want to use in the relationship.
In the second table drop-down list, select the other table you want in the relationship. Select the other column you want to use, and then select OK.

Key Aspects of Power BI Relationships:

Cardinality: Defines the nature of the relationship, i.e how many rows are related between tables. This would be One-to-many (most common, e.g., one customer to many orders), One-to-one, or Many-to-many.
Cross-filter Direction: Determines how filters flow between tables. Single (default, one side filters many side) or Both (bidirectional, filters flow both ways), though 'both' should be used cautiously.
Active vs. Inactive: Only one active relationship can exist between two tables for direct filtering, but multiple inactive relationships can be defined for use in DAX calculations.
Autodetect: Power BI can automatically find and create relationships based on matching column names during data load, though manual configuration is often needed.

Resources used:
Data Model Guidelines and Best Practice
Understand star schema and the importance for Power BI

Data analysis using excel made easy

GRACE MUTHONI MWANGI — Sat, 24 Jan 2026 15:44:52 +0000

Microsoft Excel is one of the most widely used tools in day-to-day organisational workflows. Across multiple industries, companies rely on MS Excel to:

Collect data
Organize data
Analyze and calculate numbers
Visualize data using tables and charts

To effectively master MS Excel and thrive within most corporate environments, this learning roadmap is recommended:

Understanding Microsoft Excel
Data entry and navigation
Data cleaning and formatting
Basic calculations and core functions
Data analysis with tables, pivot tables & charts
Data visualization with dashboards

Understanding Microsoft Excel

One needs to familiarise oneself with the Excel interface, workbook structure, terminologies, worksheets, and basic navigation tools. A solid understanding of these foundational concepts enables users to navigate spreadsheets efficiently and reduces errors when working with data.

Common terminologies:

Workbook
This is an Excel file that contains one or more worksheets. It serves as the main container for storing and managing related datasets within a single file.
Worksheet
This is an individual spreadsheet within a workbook, made up of rows and columns where data is entered, stored, and analyzed. Each worksheet can hold a separate dataset or a different stage of analysis
Cell
This is the basic unit of a worksheet where data, such as text, numbers, or formulas, is entered
Excel reference (cell reference)
This is the location of a specific cell in a worksheet using the column letter and row number, because cells are only formed at an intersection between a row and a column.
Examples: A1, V234
Cell range/Array
This is a group of two or more selected cells.

Excel interface:

Understanding the Microsoft Excel interface is essential for efficient navigation, accurate data entry, and effective analysis. The interface is composed of several key components, each designed to support different stages of data management and analysis.

A - Name Box
It is located on the left side of the formula bar. Shows the location of the active cell (e.g., A1, C5, G10).
This can also be used to jump/navigate to another cell

B - Formula Bar
This is located above the worksheet grid and right under the Ribbon.
It shows what is in the active cell (text, number, or formula). One can click inside it to edit cell content.

C - Ribbon
The main toolbar at the top of an Excel window. Contains tabs like Home, Insert, Page Layout, Formulas, Data, Review, View.
Each tab has groups (e.g., Font, Alignment, Number under Home).

D - Quick Access Toolbar
This is the small toolbar at the top left.
Has common icons such as Save, Undo and Redo. One can also add their favourite commands.

E - Tabs
These are located within the ribbon.
Every tab contains a set of commands that are related to each other e.g:
Home - Formatting, editing, and basic clipboard operations
File – File management tasks such as saving, opening, printing, and sharing
Data - Data import, sorting, filtering, and analysis tools
View - Worksheet display options and window management tools

F - Share button
This is only used when the workbook in question is made available online. One can share it with different collaborators and edit their permissions

Columns
These are the vertical groups of cells. They are labelled A, B, C, … at the top.

Rows
These are the horizontal groups of cells. They are labelled 1, 2, 3, … at the left

Data entry and navigation

How to enter data (data entry)

To enter data in a cell:

Click on the cell you would like to edit
Type your data inside the selected cell
On your keyboard, press "Enter" key to move down or "Tab" key to move right

Editing data

Excel provides multiple methods for editing existing cell contents

Double-click the cell and edit directly
Click the cell to be edited once, then edit in the Formula Bar
Alternatively, select the cell and press F2 to edit inside the cell

The methods indicated above allow users to correct errors, update values, and modify formulas efficiently.

Navigating with Keyboard and Mouse

One can comfortably navigate through a Microsoft Worksheet through their keyboard or mouse. Mastering navigation techniques improves speed and reduces reliance on manual scrolling, especially when working with large datasets.

Here are some of the common shortcuts one can apply when using their worksheets:

Keyboard:

Arrow keys → move one cell at a time.
Tab → move one cell to the right.
Shift + Tab → move one cell to the left.
Enter → move down one cell.
Shift + Enter → move up one cell.
Ctrl + Arrow → jump to the edge of data (end of a continuous block).
Ctrl + Home → go to A1.
Ctrl + End → go to the last used cell.
Ctrl + S → save.

Mouse:

Click a cell to select it.
Click and drag to select multiple cells (range).
Scroll with the mouse wheel to move up/down.
Drag the scroll bar at the bottom to move left/right.

Data cleaning and formatting

Data cleaning/data processing involves identifying and fixing errors, inconsistencies, and missing or incorrect values in a dataset so that the data becomes accurate, complete, consistent, and ready for analysis.

Data formatting refers to the process of changing the appearance, structure, and display style of data in a worksheet without altering the underlying values. The purpose of data formatting is to improve readability, ensure consistency, and make datasets easier to interpret, analyze, and present.

Proper formatting helps users quickly distinguish between different data types (such as text, numbers, dates, and currency), identify patterns and trends, and produce professional-looking reports and dashboards.

Establishing & removing duplicates

Duplicate rows can cause incorrect totals or confusion when analysing data.

To know whether a dataset has duplicate values, you need to:

Select the column that would contain unique values such as identification numbers (e.g., A1:A200)
Navigate to Home tab > Conditional formatting > Highlight cell rules > Duplicate values This action highlights the duplicate values by filling the selected column with a colour of choice

To remove duplicates:

Select your dataset (e.g., A1:D200)
Go to Data tab > Remove Duplicates
Ensure you have “My data has headers” ticked if the first row has column titles
Select the columns that define a duplicate (e.g., Name and Email)
Click OK
Excel shows a message with how many duplicate rows were removed

Identifying empty cells

This is done by filtering data. Filtering also hides rows that don’t meet criteria and shows only those that do.

Steps:

Click anywhere in your table (with headers)
Go to Data > Filter (or Home > Sort & Filter > Filter)
Small dropdown arrows appear on header cells
Click a dropdown: Check/Uncheck specific values Use text filters, number filters, or date filters

Correcting Data Types

Ensuring that each column contains the correct data type is a critical step in data preparation. Incorrect data types can prevent formulas from working properly, distort calculations, and cause errors in charts and pivot tables. Common data types in Excel include text, numbers, dates, and currency values.

Number Formatting
Number formats change how numbers look without changing their actual value.

Common number formats:

General (default)
Number (can show decimal places)
Currency (shows a currency symbol)
Accounting
Percentage (%)
Date
Time

Steps:

Select the cells with numbers.
Home tab > Number group.
Use the dropdown to choose Number, Currency, Percentage, Short Date, etc. Additional options: Increase/Decrease Decimal buttons to control decimal places

Text Formatting
Steps:

Select the cells (e.g., A1:D1).
Go to the Home tab > Font group:
Click B for Bold
Click I for Italic
Click U for Underline
Change font type (e.g., Calibri, Arial)
Change font size (e.g., 11 → 14)
Change font color

Conditional Formatting
Conditional formatting automatically formats cells based on rules.
Steps:

Select the data range (e.g., B2:B20)
Home tab > Conditional Formatting
Choose a rule type:
Highlight Cell Rules (Greater Than, Less Than, Between, Equal To)
Top/Bottom Rules
Data Bars (fill cells proportionally)
Color Scales (gradients)
Icon Sets (arrows, flags, etc.)

Cell Formatting
Involves adjusting column widths, row heights, borders, shading, and alignment to structure the worksheet and separate different sections of data clearly.

Sorting Data

Sorting changes the order of rows based on chosen columns.

Types of sorting:

Text: A to Z or Z to A
Numbers: Smallest to Largest or Largest to Smallest
Dates: Oldest to Newest or Newest to Oldest

Data validation (Dropdowns and rules)

Data validation is a method of formatting to control what users can enter in a cell.

Example – Dropdown list:

List items (Apples, Oranges, Bananas) somewhere (e.g., G1:G3) or type them directly later.
Select A2:A20 where you want the dropdown.
Data tab > Data Validation.

Importance of data formatting in data analysis:

Improves clarity and readability of datasets
Reduces misinterpretation of values and results
Ensures consistency across reports and dashboards
Enhances the professional presentation of analytical outputs
Supports accurate sorting, filtering, and visualization

Basic calculations and core functions

Formulas and functions are instructions that tell Excel how to perform calculations and manipulate data. Every formula in Excel begins with an equals sign (=) and may include numbers, cell references, operators, and built-in functions. Mastering these core functions is essential for accurate data analysis, automation, and reporting.

Aggregate Functions

Aggregate functions summarize a group of values into a single result. They are commonly used to analyze totals, averages, ranges, and record counts.

Sum – adds a range of numbers to return a total
Average – calculates the mean value of a range
Min – returns the smallest value in a dataset
Max – returns the largest value in a dataset
Count – counts the number of cells that contain numeric values

Key pointers:
Relative reference: This formula changes when the formula is moved from one cell to another
Eg: B1, E1, etc.

Absolute reference: This referencing ensures formulas remain the same even when they are copied or moved
Eg: $B$1, $E$1, etc.
The 2 references can also be used together

Conditional Functions

Conditional functions perform calculations only when specified conditions or criteria are met. They are essential for filtering and targeted analysis.

Sumif/Sumifs - adds values that meet one or multiple conditions
Countif/Countifs - counts records that satisfy one or more criteria
Averageif/Averageifs - calculates the average based on conditions
Maxifs/Minifs - returns the highest or lowest value that meets given criteria

Logical Functions

These functions return results based on logical conditions. These functions are widely used for decision-making, classification, and data validation

If - returns one value if a condition is true and another if it is false
And - returns TRUE if all conditions are true
Or - returns TRUE if at least one condition is true
Not - reverses the logical result of a condition**

Lookup Functions

Hlookup - searches vertically for a value and returns a corresponding result
Vlookup - searches horizontally across rows
Index & Match - a flexible combination for advanced lookups in any direction
Xlookup - supports exact and approximate matches with fewer limitations

Date and Time Functions

These functions support project planning, aging analysis, payroll calculations, and performance tracking

Today() - returns the current date
Now() - returns the current date & time
Datedif() - calculates the difference between two dates (Old date, new date)
Networkdays() - counts working days between two dates, excluding weekends and holidays
Workday() - returns a future or past working date based on a given number of days
Edate() - adds or subtracts months from a given date
Datevalue() - converts a text date into a valid Excel date format

Common terminologies used for functions

Lookup value– the value being searched for
Range – a group of selected cells
Dataset – a well-structured collection of data used for analysis
Criteria – the condition that determines which values are included
Return array – the range containing the result to be retrieved
Fill handle - the tool used to copy formulas across cells
Nesting - combining formulas within formulas

Additional Pointers:
Some of these functions will not work on some of the Microsoft Excel versions
Formula errors will differ based on the formula. Common errors include: #DIV/0!, #REF!, #NAME? etc.

Data analysis with tables, pivot tables & charts

Tables

A Table is not just formatting; it adds powerful functionality for
analysis, formulas, sorting, filtering, and reporting

Key benefits

Automatically expands when you add new data
Built-in sorting and filtering
Cleaner formulas using column names (structured references)
Automatically copies formulas down
Makes charts and PivotTables more reliable
The total row once activated is automatically added at the bottom row (One can add different functions such as sum, average of the columns)

Before creating a table, ensure:

The first row contains headers (Employee ID, First Name, Salary, Department, etc.)
No completely blank rows
No completely blank columns

Steps to follow when creating a table in Microsoft Excel:

Click anywhere inside the dataset
Press Ctrl + T or alternatively Go to Home → Format as Table
Confirm the range Excel selects
Ensure “My table has headers” is checked
Click OK

Method 2:

Click anywhere inside the dataset
On the ribbon, navigate to the insert tab > insert table Excel highlights the entire dataset provided that there is no empty cell
Click okay on the pop-up box
Format the table on the Table Design tab

Key Note: Should you need to convert a table to normal data, on the Table Design click convert to range > Confirm

Pivot tables

These allow you to summarize, analyze, and explore large datasets. Large amounts of data are often summarrized by grouping, counting, summing,or averaging values

Understanding PivotTable Areas
Pivot tables have 4 areas that appear on the fieldlist during plotting:

Rows = what you want to group by
Columns = how you want to split the groups
Values = what you want to calculate. In this field, one can change value calculations to sum, count, min etc

Steps:

Click dropdown on a value field
Value Field Settings
Choose calculation type

Filters = high-level filtering for the entire PivotTable. In this field you can have multiple filter conditions; to get the best results, always organize the fields in criterias you want to see the data by

Key Notes:
Pivot tables do not inherit number formatting. This can be adjusted in the value field settings
One can group numerical data on Pivot tables to create bands for grouped analysis. In the image below, you can tell the number of orders by cost (which is the grouped column)

Pivot charts

These are visual insights linked to pivots that are already created within the data. They are easy to understand and communiate , mainly representing comparisons, trends, distributions and proportions. To create pivot charts, you navigate to the PivotTable Analyze > pivot charts. You get a list of all charts available and you can select the most suitable chart based on the pivot data.

Types of charts

Column charts - compares values across categories using vertical bars
Bar charts - compares values across categories using horizontal bars
Line charts - shows trends over time using a line plot
Pie charts - shows proportions of a whole
Donut charts - similar to a pie chart, it however contains a hole at the center like a doughnut
Stacked column charts/ 100% Stacked column charts - shows total values broken into sub-categories one in totals and the other in percentages
Area chart - shows trends over time with filled areas
Scatter (XY) chart - shows relationship between two numeric variables
Histogram - shows distribution of numeric data
Box and Whisker chart - shows median, quartiles, and outliers
Combo chart - combines two charts such as a bar and line chart
Waterfall chart - shows how positive and negative values build to a total

You can modify the chart properties on the Design tab > chart layouts.

Data visualization with dashboards

A Microsoft Excl Dashboard is a single interactive screen that visually summarizes the most important data insights at one glance. It combines PivotTables, charts, KPIs, slicers, and good layout design to support
decision-making

Non-negotiable dashboard principles:

Dashboards should be one screen only (no scrolling)
Focus on key KPIs, not raw data
Consistent colors and fonts
Clear titles and labels
Interactive but simple

Decoding Git and GitHub

GRACE MUTHONI MWANGI — Sat, 17 Jan 2026 09:22:45 +0000

Github

This is a web-based platform that helps users store, manage, and collaborate on software projects.

Importance of GitHub:

Collaboration:
Multiple people can work on the same project without overwriting each other’s work and with full visibility. Team members can view real-time updates, contribute changes, and communicate through the repository file. For private repositories, collaborators must be explicitly added by the repository owner.
Code Storage & Backup:
GitHub acts as a cloud-based repository where your code is safely stored and accessible from anywhere. This minimizes the risk of losing one's code.
Portfolio & Career Growth:
Projects that are worked on GitHub can serve as a portfolio to showcase one's skills to potential employers.
Open-source code contributions:
GitHub contains numerous codes that are freely available and can easily be modified, integrated, and incorporated into other codes.
Version control:
GitHub allows one to track and manage changes made to code.

Creating a GitHub account

Visit the official GitHub site
Click on Sign In >> Continue with Google
Choose your email and username
Click on create account >> Adjust your profile details and save

Git (Git Bash)

Git is a version-control system, whilst Git Bash is a commanding system that helps a user to communicate lines of code from their local machine and their web-based GitHub account.

Git Bash installation:

Head to the Git official website
Install the application; ensure you install an app that is compatible with your operating system.

Connecting Git Bash to GitHub

A successful connection is determined by a couple of steps undertaken in the sequence shown below:

Configuration

git config --global user.name "Your Name" - This command is meant to help set up your username
git config --global user.email "your.email@example.com"- This command is meant to help set up your email address
git config --list - This command is meant to verify your configuration

Generating SSH Key

Run this command to check for existing keys
ls -al ~/.ssh
If you see files named id_ed25519 and id_ed25519.pub (or id_rsa and id_rsa.pub), you have an existing key pair. If not, you'll need to generate one
Generate the SSH key by running this command:
ssh-keygen -t ed25519 -C "your_email@example.com"
In this step, one will be prompted to save the key. It is paramount that they save it in a file location that one can easily remember and retrieve from

Starting SSH agent

To start the SSH agent, run this command:
eval $(ssh-agent -s)
This command varies based on the user's operating system
Add your private key to the running ssh-agent
ssh-add ~/.ssh/id_ed25519

Adding the key to Git account

Copy your public key to the clipboard
Open the GitHub platform online
- Click on Profile >> Settings
- Click on SSH and GPG keys
- Add new SSH key >> paste the copied key on the clipboard
Test the connection

Collaboration and version control

To start coding, you will need to create a new repository; this repository will contain the main code. If you need to test changes, create a branch. This allows you to make edits without affecting the main code.

To create a branch on Git Bash if you're using an existing repository:
Git clone **paste the SSH key of the repo** - fetches the open source code from GitHub to Git Bash
cd **repository name** - activates the repo on Git Bash
git branch - shows the main branches available within a repository
git checkout -b **name of the branch** – creates the branch and redirects you to the branch created
git pull - syncs the existing changes
git push origin **name of the branch** - uploads the added branch to the repo on GitHub.

Once you make changes to the Git code, you can proceed to merge the changes using the steps below:
git branch - shows the main branches available within a repository

The branch highlighted in green is the branch that you are currently working on.
To move to the branch you want to work on, you can proceed to use the checkout code or the change directory code

git pull - this helps sync the changes made by other collaborators

It is a good practice to ensure that the file is up to date before making other changes

git merge **name of the file that has the changes**- this will merge the files

Once you run this command, you get the actions that have been undertaken on the branch

git commit -m "**name of the branch** merged" - this helps confirm the merge
git push - this helps upload the changes to the main repo file

To track the changes that have been made to a code on GitHub, you need to monitor as indicated below:

Green highlights means that the code has been changed/modified with add-on codes
Red highlights indicate that the code has been deleted.
Commits shows the number of changes pushed to the original code

Additionally, you can go back to the original edit on Git if the merged changes were not approved/are wrong. Best practice when using a public repository is to use the revert code. However, when one is using a private repository, you can proceed to delete and forcefully reset the number of commits; this way, the edit that was pushed to the code can no longer be seen

Common terminologies

Pushing code - uploading changes from one's local machine to GitHub
Pulling code - downloading changes from GitHub to a local machine
Command lines - giving instructions to the system through codes
Version control - tracking and managing changes to code on GitHub
Repository (Repo)- cloud-based storage location for project files and code
- Public repositories are accessible to everyone on GitHub
  
  Use these commands to copy an existing repository from the internet
  Git clone paste the SSH key of the repo
  cd repository name to activate the repo on Git Bash
  git branch to see the branch available
- Private repositories are only accessible to the user

Common command lines:

git --version - This helps check the Git Bash version
mkdir "Name of the folder" - Creating a folder within GitHub on GitBash
git status - status of on your repository
cd "Created Folder - This is used to change the directory to the folder that was created. All commands that follow this command are within the specifically mentioned folder
touch "Name of file"- This helps create a file within a folder. You should specify the type of file you want to create i.e .py READ.Me
git init - This initializes a new repository which holds the files containing the codes
git branch -a - This lists all the branches within the repository created

Quick Reference Guides & Sources

Git Bash tutorial
Git basics
Git & GitHub visual tutorial for beginners