DEV Community

Cover image for Data analysis using excel made easy
GRACE MUTHONI MWANGI
GRACE MUTHONI MWANGI

Posted on • Edited on

Data analysis using excel made easy

Microsoft Excel is one of the most widely used tools in day-to-day organisational workflows. Across multiple industries, companies rely on MS Excel to:

  • Collect data
  • Organize data
  • Analyze and calculate numbers
  • Visualize data using tables and charts

To effectively master MS Excel and thrive within most corporate environments, this learning roadmap is recommended:

  1. Understanding Microsoft Excel
  2. Data entry and navigation
  3. Data cleaning and formatting
  4. Basic calculations and core functions
  5. Data analysis with tables, pivot tables & charts
  6. Data visualization with dashboards

Understanding Microsoft Excel

One needs to familiarise oneself with the Excel interface, workbook structure, terminologies, worksheets, and basic navigation tools. A solid understanding of these foundational concepts enables users to navigate spreadsheets efficiently and reduces errors when working with data.

Common terminologies:
  1. Workbook
    This is an Excel file that contains one or more worksheets. It serves as the main container for storing and managing related datasets within a single file.

  2. Worksheet
    This is an individual spreadsheet within a workbook, made up of rows and columns where data is entered, stored, and analyzed. Each worksheet can hold a separate dataset or a different stage of analysis

  3. Cell
    This is the basic unit of a worksheet where data, such as text, numbers, or formulas, is entered

  4. Excel reference (cell reference)
    This is the location of a specific cell in a worksheet using the column letter and row number, because cells are only formed at an intersection between a row and a column.
    Examples: A1, V234

  5. Cell range/Array
    This is a group of two or more selected cells.

Excel interface:

Understanding the Microsoft Excel interface is essential for efficient navigation, accurate data entry, and effective analysis. The interface is composed of several key components, each designed to support different stages of data management and analysis.

A - Name Box
It is located on the left side of the formula bar. Shows the location of the active cell (e.g., A1, C5, G10).
This can also be used to jump/navigate to another cell

B - Formula Bar
This is located above the worksheet grid and right under the Ribbon.
It shows what is in the active cell (text, number, or formula). One can click inside it to edit cell content.

C - Ribbon
The main toolbar at the top of an Excel window. Contains tabs like Home, Insert, Page Layout, Formulas, Data, Review, View.
Each tab has groups (e.g., Font, Alignment, Number under Home).

D - Quick Access Toolbar
This is the small toolbar at the top left.
Has common icons such as Save, Undo and Redo. One can also add their favourite commands.

E - Tabs
These are located within the ribbon.
Every tab contains a set of commands that are related to each other e.g:
Home - Formatting, editing, and basic clipboard operations
File – File management tasks such as saving, opening, printing, and sharing
Data - Data import, sorting, filtering, and analysis tools
View - Worksheet display options and window management tools

F - Share button
This is only used when the workbook in question is made available online. One can share it with different collaborators and edit their permissions

Columns
These are the vertical groups of cells. They are labelled A, B, C, … at the top.

Rows
These are the horizontal groups of cells. They are labelled 1, 2, 3, … at the left

Data entry and navigation

How to enter data (data entry)

To enter data in a cell:

  • Click on the cell you would like to edit
  • Type your data inside the selected cell
  • On your keyboard, press "Enter" key to move down or "Tab" key to move right
Editing data

Excel provides multiple methods for editing existing cell contents

  • Double-click the cell and edit directly
  • Click the cell to be edited once, then edit in the Formula Bar
  • Alternatively, select the cell and press F2 to edit inside the cell

The methods indicated above allow users to correct errors, update values, and modify formulas efficiently.

Navigating with Keyboard and Mouse

One can comfortably navigate through a Microsoft Worksheet through their keyboard or mouse. Mastering navigation techniques improves speed and reduces reliance on manual scrolling, especially when working with large datasets.

Here are some of the common shortcuts one can apply when using their worksheets:

Keyboard:

  • Arrow keys → move one cell at a time.
  • Tab → move one cell to the right.
  • Shift + Tab → move one cell to the left.
  • Enter → move down one cell.
  • Shift + Enter → move up one cell.
  • Ctrl + Arrow → jump to the edge of data (end of a continuous block).
  • Ctrl + Home → go to A1.
  • Ctrl + End → go to the last used cell.
  • Ctrl + S → save.

Mouse:

  • Click a cell to select it.
  • Click and drag to select multiple cells (range).
  • Scroll with the mouse wheel to move up/down.
  • Drag the scroll bar at the bottom to move left/right.

Data cleaning and formatting

Data cleaning/data processing involves identifying and fixing errors, inconsistencies, and missing or incorrect values in a dataset so that the data becomes accurate, complete, consistent, and ready for analysis.

Data formatting refers to the process of changing the appearance, structure, and display style of data in a worksheet without altering the underlying values. The purpose of data formatting is to improve readability, ensure consistency, and make datasets easier to interpret, analyze, and present.

Proper formatting helps users quickly distinguish between different data types (such as text, numbers, dates, and currency), identify patterns and trends, and produce professional-looking reports and dashboards.

Establishing & removing duplicates

Duplicate rows can cause incorrect totals or confusion when analysing data.

To know whether a dataset has duplicate values, you need to:

  • Select the column that would contain unique values such as identification numbers (e.g., A1:A200)
  • Navigate to Home tab > Conditional formatting > Highlight cell rules > Duplicate values This action highlights the duplicate values by filling the selected column with a colour of choice

To remove duplicates:

  • Select your dataset (e.g., A1:D200)
  • Go to Data tab > Remove Duplicates
  • Ensure you have “My data has headers” ticked if the first row has column titles
  • Select the columns that define a duplicate (e.g., Name and Email)
  • Click OK
  • Excel shows a message with how many duplicate rows were removed
Identifying empty cells

This is done by filtering data. Filtering also hides rows that don’t meet criteria and shows only those that do.

Steps:

  • Click anywhere in your table (with headers)
  • Go to Data > Filter (or Home > Sort & Filter > Filter)
  • Small dropdown arrows appear on header cells
  • Click a dropdown: Check/Uncheck specific values Use text filters, number filters, or date filters
Correcting Data Types

Ensuring that each column contains the correct data type is a critical step in data preparation. Incorrect data types can prevent formulas from working properly, distort calculations, and cause errors in charts and pivot tables. Common data types in Excel include text, numbers, dates, and currency values.

Number Formatting
Number formats change how numbers look without changing their actual value.

Common number formats:

  1. General (default)
  2. Number (can show decimal places)
  3. Currency (shows a currency symbol)
  4. Accounting
  5. Percentage (%)
  6. Date
  7. Time

Steps:

  • Select the cells with numbers.
  • Home tab > Number group.
  • Use the dropdown to choose Number, Currency, Percentage, Short Date, etc. Additional options: Increase/Decrease Decimal buttons to control decimal places

Text Formatting
Steps:

  1. Select the cells (e.g., A1:D1).
  2. Go to the Home tab > Font group:
  3. Click B for Bold
  4. Click I for Italic
  5. Click U for Underline
  6. Change font type (e.g., Calibri, Arial)
  7. Change font size (e.g., 11 → 14)
  8. Change font color

Conditional Formatting
Conditional formatting automatically formats cells based on rules.
Steps:

  1. Select the data range (e.g., B2:B20)
  2. Home tab > Conditional Formatting
  3. Choose a rule type:
  4. Highlight Cell Rules (Greater Than, Less Than, Between, Equal To)
  5. Top/Bottom Rules
  6. Data Bars (fill cells proportionally)
  7. Color Scales (gradients)
  8. Icon Sets (arrows, flags, etc.)

Cell Formatting
Involves adjusting column widths, row heights, borders, shading, and alignment to structure the worksheet and separate different sections of data clearly.

Sorting Data

Sorting changes the order of rows based on chosen columns.

Types of sorting:

  • Text: A to Z or Z to A
  • Numbers: Smallest to Largest or Largest to Smallest
  • Dates: Oldest to Newest or Newest to Oldest
Data validation (Dropdowns and rules)

Data validation is a method of formatting to control what users can enter in a cell.

Example – Dropdown list:

  1. List items (Apples, Oranges, Bananas) somewhere (e.g., G1:G3) or type them directly later.
  2. Select A2:A20 where you want the dropdown.
  3. Data tab > Data Validation.
Importance of data formatting in data analysis:
  • Improves clarity and readability of datasets
  • Reduces misinterpretation of values and results
  • Ensures consistency across reports and dashboards
  • Enhances the professional presentation of analytical outputs
  • Supports accurate sorting, filtering, and visualization

Basic calculations and core functions

Formulas and functions are instructions that tell Excel how to perform calculations and manipulate data. Every formula in Excel begins with an equals sign (=) and may include numbers, cell references, operators, and built-in functions. Mastering these core functions is essential for accurate data analysis, automation, and reporting.

Aggregate Functions

Aggregate functions summarize a group of values into a single result. They are commonly used to analyze totals, averages, ranges, and record counts.

  • Sum – adds a range of numbers to return a total

  • Average – calculates the mean value of a range

  • Min – returns the smallest value in a dataset

  • Max – returns the largest value in a dataset

  • Count – counts the number of cells that contain numeric values

Key pointers:
Relative reference: This formula changes when the formula is moved from one cell to another
Eg: B1, E1, etc.

Absolute reference: This referencing ensures formulas remain the same even when they are copied or moved
Eg: $B$1, $E$1, etc.
The 2 references can also be used together

Conditional Functions

Conditional functions perform calculations only when specified conditions or criteria are met. They are essential for filtering and targeted analysis.

  • Sumif/Sumifs - adds values that meet one or multiple conditions
  • Countif/Countifs - counts records that satisfy one or more criteria

  • Averageif/Averageifs - calculates the average based on conditions

  • Maxifs/Minifs - returns the highest or lowest value that meets given criteria

Logical Functions

These functions return results based on logical conditions. These functions are widely used for decision-making, classification, and data validation

  • If - returns one value if a condition is true and another if it is false

  • And - returns TRUE if all conditions are true

  • Or - returns TRUE if at least one condition is true

  • Not - reverses the logical result of a condition**

Lookup Functions
  • Hlookup - searches vertically for a value and returns a corresponding result
  • Vlookup - searches horizontally across rows
  • Index & Match - a flexible combination for advanced lookups in any direction
  • Xlookup - supports exact and approximate matches with fewer limitations
Date and Time Functions

These functions support project planning, aging analysis, payroll calculations, and performance tracking

  • Today() - returns the current date
  • Now() - returns the current date & time
  • Datedif() - calculates the difference between two dates (Old date, new date)

  • Networkdays() - counts working days between two dates, excluding weekends and holidays

  • Workday() - returns a future or past working date based on a given number of days

  • Edate() - adds or subtracts months from a given date

  • Datevalue() - converts a text date into a valid Excel date format

Common terminologies used for functions
  • Lookup value– the value being searched for
  • Range – a group of selected cells
  • Dataset – a well-structured collection of data used for analysis
  • Criteria – the condition that determines which values are included
  • Return array – the range containing the result to be retrieved
  • Fill handle - the tool used to copy formulas across cells
  • Nesting - combining formulas within formulas

Additional Pointers:
Some of these functions will not work on some of the Microsoft Excel versions
Formula errors will differ based on the formula. Common errors include: #DIV/0!, #REF!, #NAME? etc.

Data analysis with tables, pivot tables & charts

Tables

A Table is not just formatting; it adds powerful functionality for
analysis, formulas, sorting, filtering, and reporting

Key benefits

  • Automatically expands when you add new data
  • Built-in sorting and filtering
  • Cleaner formulas using column names (structured references)
  • Automatically copies formulas down
  • Makes charts and PivotTables more reliable
  • The total row once activated is automatically added at the bottom row (One can add different functions such as sum, average of the columns)

Before creating a table, ensure:

  • The first row contains headers (Employee ID, First Name, Salary, Department, etc.)
  • No completely blank rows
  • No completely blank columns

Steps to follow when creating a table in Microsoft Excel:

  1. Click anywhere inside the dataset
  2. Press Ctrl + T or alternatively Go to Home → Format as Table
  3. Confirm the range Excel selects
  4. Ensure “My table has headers” is checked
  5. Click OK

Method 2:

  1. Click anywhere inside the dataset
  2. On the ribbon, navigate to the insert tab > insert table Excel highlights the entire dataset provided that there is no empty cell
  3. Click okay on the pop-up box
  4. Format the table on the Table Design tab

Key Note: Should you need to convert a table to normal data, on the Table Design click convert to range > Confirm

Pivot tables

These allow you to summarize, analyze, and explore large datasets. Large amounts of data are often summarrized by grouping, counting, summing,or averaging values

Understanding PivotTable Areas
Pivot tables have 4 areas that appear on the fieldlist during plotting:

  • Rows = what you want to group by
  • Columns = how you want to split the groups
  • Values = what you want to calculate. In this field, one can change value calculations to sum, count, min etc

Steps:

  1. Click dropdown on a value field
  2. Value Field Settings
  3. Choose calculation type
  • Filters = high-level filtering for the entire PivotTable. In this field you can have multiple filter conditions; to get the best results, always organize the fields in criterias you want to see the data by

Key Notes:
Pivot tables do not inherit number formatting. This can be adjusted in the value field settings
One can group numerical data on Pivot tables to create bands for grouped analysis. In the image below, you can tell the number of orders by cost (which is the grouped column)

Pivot charts

These are visual insights linked to pivots that are already created within the data. They are easy to understand and communiate , mainly representing comparisons, trends, distributions and proportions. To create pivot charts, you navigate to the PivotTable Analyze > pivot charts. You get a list of all charts available and you can select the most suitable chart based on the pivot data.

Types of charts

  1. Column charts - compares values across categories using vertical bars

  2. Bar charts - compares values across categories using horizontal bars

  3. Line charts - shows trends over time using a line plot

  4. Pie charts - shows proportions of a whole

  5. Donut charts - similar to a pie chart, it however contains a hole at the center like a doughnut

  6. Stacked column charts/ 100% Stacked column charts - shows total values broken into sub-categories one in totals and the other in percentages

  7. Area chart - shows trends over time with filled areas

  8. Scatter (XY) chart - shows relationship between two numeric variables

  9. Histogram - shows distribution of numeric data

  10. Box and Whisker chart - shows median, quartiles, and outliers

  11. Combo chart - combines two charts such as a bar and line chart

  12. Waterfall chart - shows how positive and negative values build to a total

You can modify the chart properties on the Design tab > chart layouts.

Data visualization with dashboards

A Microsoft Excl Dashboard is a single interactive screen that visually summarizes the most important data insights at one glance. It combines PivotTables, charts, KPIs, slicers, and good layout design to support
decision-making

Non-negotiable dashboard principles:

  • Dashboards should be one screen only (no scrolling)
  • Focus on key KPIs, not raw data
  • Consistent colors and fonts
  • Clear titles and labels
  • Interactive but simple

Top comments (0)