DEV Community

Cover image for INTRODUCTION TO MS EXCEL FOR DATA ANALYTICS
Derrick Kimanthi
Derrick Kimanthi

Posted on

INTRODUCTION TO MS EXCEL FOR DATA ANALYTICS

INTRODUCTION

If you are a starting out in data analytics, I am sure you have heard of excel. You see excel is a tool that helps you analyze data, create dashboards, perform data entries, automate tasks and so much more. In this guide i will explain how MS excel can be used for basic data analytics, in a simple and beginner-friendly language.

MS Excel Layout

In this article, i will assume you have already downloaded and installed Microsoft excel. When you open excel the first page you will see is the start up screen. After which you will be redirected to a page where you will select a blank workbook or templates of your choice depending on the type of tasks you are performing. For a beginner i recommend you starting with a blank book. This will enable you to practice entering data, applying formulas as well as exploring Microsoft Excel's features.
Once you select the blank work book, excel takes you to a new worksheet:
In the worksheet there is a:

  1. Title Bar- Displays the name of your workbook and the application (Excel).
  2. Quick Access Toolbar-Provides access to frequently used commands.

  3. Formula Bar- It shows the contents of a selected cell. You can also type and enter formulas and data.

  4. Ribbon-This is the main menu at the top ofthe work sheet, it's organized into tabs; whereby each contains a group of commands and tools.

  5. Cell-This is where you enter data. Each cell is identified by its column letter and row number (e.g., A1).

  6. Row-A horizontal line of cells, numbered along the left side of the worksheet.

  7. Column-A vertical line of cells, labeled with letters across the top of the worksheet.

  8. Scroll Bars-Allow you to move horizontally or vertically through your worksheet.
    Below is an Image that clearly illustrates the above:

An Excel Worksheet

Basic Formatting

This basically involves improving the data readability. It's simply done by adjusting the font and size, alingment and adding other formatting styles like bold, italics etc. These tools are located at the home tab.

Home Tab

Important Formatting techniques:

  1. Font Styles-Simply it consists of font types,sizes,color or making the text either bold, italic. You can also use the paint bucket to fill the entire text with a color of your choice.

  2. Number Formatting-This involves choosing the correct data type that custom-fits a column and a cell e.g. you can choose currency, dates, text number etc.

  3. Alingment-Adjusting the alingments enhaces data readability, you can align data center, top, right, bottom , vertical alingment and text wrap.

  4. Row & column sizing-You can adjust the height of rows, when your text or data doesnot fit you can raise the height of the row ensuring that data or text fits nicely.

  5. Freeze panes- The headers can be freezed allowing you to see the headers when scrolling in large data sets

freezing panes

Working with a large data set

When working with large data sets; you will need to apply some formatting and formulas that will make your work easier.
For Example:

Large data set

Data Validation

This is a tool that helps you control the type of data entered in a cell. It's essential as it helps maintain accuracy by restricting what is entered. For example you create a dropdown that limits a cell's selection to either Male or Female e.t.c. for gender selection.
To do this:

  • Select a column

  • Click the data tab

  • Then click data validation
    Data Manipulation

  • You will see a popup menu

  • Select the list in the dropdown Menu

  • Type the source as seen and click ok
    Confirmation

    Data Sorting

    This involves arranging data in a specific order:
    It includes:

  1. Alphabetical sortin- from A to Z.

  2. Number sorting-smallest to largest and viceversa.

  3. Date and time sorting-oldest to new and viceversa.
    To do this:

  • Select all the data by CTRL+A

  • Click sorting, then navigate to custom sorting.

  • Then select the column you want to sort.

Data Sorting

Data Filtering

its used to show data that meets the specific conditions while hiding the rest.
E.g Show sales above 10,000
How to do it:

  • Select your data

  • Click the data tab then filter

  • A small dropdown arrow will appear on each column header

Types of filtering include:

  1. Text Filters

  2. Number filters

  3. Date filters
    Illustration:

Data Filtering

Functions In Excel

Text Functions
Majorly these functions manipulate texts, they join and transform texts from one form to another e.g uppercase, lowercase, propercase. They include:

  1. Upper case-converts text from lower case to uppercase syntax=UPPER(Cell_reference)
  2. Lower case-Converts text to lowercase Syntax: =LOWER(cell_reference)
  3. Trim Function-Removes extra spaces from text except single spaces between words Syntax: =TRIM(text)
  4. Length Function-Counts the number of characters in a text string Syntax: =LEN(text)
  5. Left Function-Extracts a specified number of characters from the left side of a text Syntax: =LEFT(text, num_chars)
  6. Right Function-Extracts a specified number of characters from the right side of a text Syntax: =RIGHT(text, num_chars)
  7. Mid Function-Extracts text from the middle of a text string Syntax: =MID(text, start_num, num_chars)
  8. Concate-Joins two or more text strings together Syntax:=CONCAT(text1, text2, …)
  9. Proper case-Converts text to proper case (first letter of each word capitalized) Syntax: =PROPER(cell_reference)

Aggregate Functions

These functions are used to perform calculations on a group of values. They are performed on numerical values. They include:

  1. Sum-Performs a summation for a range. Syntax:=SUM(range)
  2. Average-Calculates the mean of a range. Syntax:=AVERAGE(range)
  3. Count-Counts cells that contain numbers.Syntax:=COUNT(range)
  4. Counta-Counts cells that are not empty. Syntax:=COUNTA(range)
  5. Max-It returns the maximum value in a range. Syntax:=MAX(range)
  6. Min-It returns the minimum value in a range. Syntax:=MIN(range) ILLUSTRATION: SUM FUNCTION The highlighted part is the selected range, Once you press enter excel will calculate and display the result of the sum of the above range.

Conditional Aggregation

These functions calculate totals, counts or averages based on one or more conditions.
E.g Total sales based on region = Nairobi where count of sales<10
They include:

1.SUMIF-Adds values with one condition. Syntax:=SUMIF(range,criteria,sum_range)

  1. SUMIFS-Adds values with multiple conditions. Syntax:=SUMIFS(sum_range, criteria_range1, criterion1, [criteria_range2, criterion2], ...).
  2. COUNTIF-Counts the number of cells that satisfy a certain condition. Syntax:=COUNTIF(Range,criteria)
  3. COUNTIFS-Counts the number of cells that satisfy multiple conditions. Syntax:=COUNTIFS(criteria_range1, criterion1, [criteria_range2, criterion2], ...).
  4. AVERAGEIF-Calculates the mean of values based on one condition. Syntax:=AVERAGEIF(range, criteria, [average_range]),
  5. AVERAGEIFS-Calculates the mean of values based on multiple conditions. Syntax:=AVERAGEIFS(average_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...)

ILLUSTRATION:

SUMIFS ILLUSTRATION
Once you press enter, Excel will calculate the Sum based on the condition highlighted in blue which is the sales departement.

Logical Functions

Logical functions automate decision-making. What they do is: compare ideas, Test conditions, return true or false based on specific results.
They Include

  1. IF Functions-Perform Logical test and returns false based on the decision.
    syntax: =IF(Logical_test, Value_if_true,value_if_false)
    E.g =IF(E2>80000, "High","Low"

  2. Nested IF- Used when you have more than two conditions
    it's an if inside an if( therefore nested if)
    Syntax:=IF(condition1, value_if_true1, IF(condition2, value_if_true2, value_if_false_all))
    E.g A scenario whereby you are required to categorise employees by age as follows: highly experienced (>30), moderately experienced(20-30), low experinced(10-20),very low experience(less than 10).
    Approach:

  • Insert another column to your data by right-clicking on the header, then select insert. You can then name it Experience level

  • At the formula bar you can use the following formula:
    =IF(P>30, "Highly Experience",IF (P2>=20 "Moderately Experienced", IF (P2>=10 "Low Experienced", IF(P2<10 "Very Low Experience")))) once you press enter, excel will categorise the employees based on the following criteria.
    ILLUSTRATION:

Nested IF

Nested IF
After autofilling:

Nested IF

  1. AND FUNCTION-Returns true if all conditions are true. Syntax:=IF(AND(Conditional1,Conditional2) Example:=IF(AND(C2>50,D2>5),"Eligible",Not Eligible") In this case all conditions have to be met.

4.OR Function-Returns true if any condition is true
E.g=IF(OR(C2>50,D2>5),"Eligible",Not Eligible")
In this case only one condition needs to be met.

VLOOKUP VS HLOOKUP

VLOOKUP-it searches for a value in the first column of a range and returns a value from another column.
It functions by searching for a value vertically (down a column).
It looks in the first column of the table.
Then returns the value from another column in the same row.

Syntax:=VLOOKUP(lookup_value, Table_Array, Col_Index_Num, [Range_LookUP]
Where: lookup_value = Value
Table_Array = Entire Table
Col_Index_Num = Column number
[Range_LookUP] = setup to False

LOOKUP-Stands for horizontal lookup. It searches for a value in the first row of a table and returns a value from the same column in a row you specify.

Syntax:=HLOOKUP(lookup_value, Table_Array, Col_Index_Num, [Range_LookUP]
SCENARIO:
How much is the bonus for employee 108759 =HLOOKUP(108759,A1:A1867,13,False)
What is the marital status for employee 10622=HLOOKUP(10622,A1:A1867,14,False)
NB To use both of these tools, you need to have an idea of what you are searching for. This enables you to find other related information easily.

INDEX AND MATCH

INDEX-Refers to organizing data so that excel can quickly locate and retrieve specific values.
Syntax:=INDEX(Array,row_num,[Col_num]
INDEX WITH MATCH-
This Combination replaces VLOOKUP & HLOOKUP, by enabling you to search anywhere.
Syntax:=INDEX(Return_range,**match**(Lookup_value,Lookup_range,0))
You simply index what you are trying to find:
E.g.=INDEX(F2:F877,**match(10871,A2:A877,0))
ILLUSTRATION:Find the department for employee 10871

Index and match
The result will be:

Results
As you can see the result is marketing department.

Conclusion:

In the above guide, we have only focused on the basic functionalities of Excel. As a beginner understanding the full potential of Excel is essential towards the growth of your data analytics career. Therefore by starting small through practicing, you will be able to master these and more concepts easily.

Top comments (0)