Organizations in healthcare, finance, manufacturing, retail and every other you can think of, use Excel for various reasons. For example, financial institutions like banks use Excel to create budgets, track cashflow or forecast revenue. Retailers may use Excel to track their stock levels or store their suppliers' data. Healthcare organizations may use Excel to store patient data and many other uses.
So What Is Excel?
Excel is a spreadsheet program developed by Microsoft that is used to organize, analyze and even visualize data. It stores data in a tabular format and provides a wide range of formulas, functions and visualization features such as charts which are used in the data analysis process. Despite the increase in advanced analytical tools, Excel is still widely used because it is user friendly, flexible and powerful enough to handle large datasets. It is also a great tool for novice data analysts to start off with in their analytical journey. Being a beginner as well, the first tool in my data analytics toolkit is none other than Excel.
Useful Features in Excel for Data Analysts
Several functions and formulas in Excel are commonly used to transform and analyze data:
Data Cleaning
Data is often 'dirty' and therefore a bit of cleaning is recommended before analyzing it to ensure accuracy. Below is a couple of features I use in my data preparation:
- Find and Replace: You can find this feature in the Home tab or through the shortcut [Ctrl+H]. This feature is used to remove unwanted texts or characters from your dataset.
- TRIM Function: is used to remove extra spaces from text data type. [e.g. =TRIM(A2)] where A2 is the cell reference of the text you want to clean.
- Text to Columns: Found in the Data tab, this feature is used to split data into separate columns. For example, if your dataset contains a field/column with full names and you would like to separate them into different columns, this feature will come in handy.
- Conditional Formatting: Also found in the Home tab in the Excel ribbon. For example, you may have a dataset with duplicate entries say in the name column, use conditional formatting to highlight the cells with duplicates while assessing data in the entire row, if it's also duplicated you can do away with the duplicates and remain with one entry.
- Data Validation: I recommend using this feature when gathering your data, to avoid e.g. 'typos' and any other inconsistencies. This feature found in the Data tab, will help to lighten the data cleaning process due to the preset rules at the point of data entry.
Data Transformation
You want to analyze data that is in a usable format and below as some functions that could be useful in transforming your data:
By Formatting
Once you open your dataset and you are finding it difficult to comprehend what you are looking at e.g. some columns might not be visible or the data cuts off at some point, some cells might be blank or you can't tell if your fields/columns have a title, this is why formatting is recommended to improve readability. In your Home tab, you'll find all formatting tools you need to improve your dataset. You can bolden the column titles to distinguish them from the rest of the dataset, you can autofit the column width to make all cells with data visible and also change the font type and size to your liking and many more.
Using Functions
- ABS: [=ABS(cell reference)] This feature known as absolute, is useful when you want to convert negative values to positive values.
- VALUE: [=VALUE(cell reference)] is used when converting numbers that are in text format to numbers.
- UPPER: [UPPER(cell reference)] This feature is used to convert text that is in small letters to capital letters.
- LOWER: [=Lower(cell reference)] is used to convert text that is in capital letters to small letters.
- PROPER: [=PROPER(cell reference)] This is used to convert the first letter of the text to a capital letter.
There are more Excel features used by Data analysts to transform messy datasets into usable and readable formats. I only mentioned these few for a start that are, in my opinion; easy to grasp for a beginner.
Data Analysis
This is the part where the processed data is analyzed to generate insights that are used in decision making by stakeholders e.g. management, marketing team, healthcare professionals.
A few functions used in data analysis:
- MEAN/AVERAGE: [=AVERAGE(range of cells)]
- MEDIAN: [=MEDIAN(range of cells)]
- MODE: [=MODE(range of cells)]
- SUM: [=SUM(range of cells)]
- COUNT: [=COUNT(range of cells)]
- MAX: [=MAX(range of cells)]
- MIN: [=MIN(range of cells)]
Pivot Tables
When you want fast results, think pivot tables. These are tables used in summarizing data according to the output you are looking of. This is a quick method and tool used in analyzing large datasets. Please note that it does not change the original dataset. It facilitates identifying patterns and trends and comparing metrics across groups or categories to list a few.
To create a pivot table, navigate to the Excel ribbon and click the insert tab and to the left most section of your ribbon click on pivot table and choose where to place it in your workbook.
Time may not always be on your side as an analyst, so pivot tables are a great tool to use when you want to save time from manual calculations listed above, explore your dataset quickly and are also helpful in creating dashboards and reports.
Data Visualization
Stakeholders; these are the people interested in what the data is 'communicating'. You'll be sharing the results with them and most will not be interested in seeing the process it took you to get to the results.
This is why dashboards are essential. While there are tools specifically developed for data visualization, Excel can still manage to handle visualization tasks.
You are able to create charts such as Bar Charts, Column Charts, Scatter Plots, Pie charts, Maps and many more depending on what you need to visualize.
Conclusion
The Data Analytics journey can feel overwhelming in the beginning, You are not alone!. Choosing which tool to start your learning journey can be confusing and the persistent need to learn everything at once will leave you stressed and feeling inept. I've been there!
Pick one tool to learn at a time. It's not about the quantity of knowledge and skill but the quality. Take your time in grasping the concepts, understanding how to question the data, understanding why inconsistencies exist and how to handle them and what different variables represent.
I chose to learn Excel first, and the experience strengthened both my technical and analytical skills. I still have a long way to go in terms of tools I intend to add to my toolkit, but I am now more confident in my ability to work with data.
Top comments (0)