DEV Community: Loryne Joy Omwando

RAG FOR DUMMIES

Loryne Joy Omwando — Sun, 14 Sep 2025 17:34:01 +0000

When I first heard the term RAG (Retrieval-Augmented Generation), I honestly thought it was another one of those intimidating machine learning buzzwords that only AI researchers could understand. But after digging deeper, I realized RAG is actually a very practical concept—one that makes Large Language Models (LLMs) like GPT smarter, more accurate, and much more useful. If you’re new to AI, Machine Learning, or just curious about how modern AI systems answer questions so effectively, this article is for you.

What is RAG?

At its core, Retrieval-Augmented Generation (RAG) is a technique that combines two worlds:

Retrieval → Searching for relevant information from a knowledge base or database.
Generation → Using a language model to create a human-like answer.

Instead of expecting an LLM to "memorize" the entire internet during training, RAG gives it the ability to look things up in real time, and then use that retrieved information to generate better answers.

Why is RAG Needed?

LLMs are powerful, but they have two major limitations:

Knowledge cutoff → They can’t know anything beyond the data they were trained on.
Hallucination → They sometimes make up answers confidently, even when wrong.

RAG solves these issues by connecting the model to an external knowledge source (like a vector database, Wikipedia, or your company’s documents). Instead of hallucinating, the model retrieves facts and then forms a response.

Think of it like this: without RAG, an LLM is like a student trying to take an exam with no notes. With RAG, the student is allowed to bring reference books into the exam hall.

How RAG Works (Step by Step)

User asks a question → e.g., “What are the symptoms of diabetes?”
Retriever fetches documents → The system searches a knowledge base (medical docs, Wikipedia, etc.) and pulls relevant passages.
Generator creates an answer → The LLM uses both the retrieved docs and its own language ability to craft a final response.

This makes the answer both accurate and well-written.

Example in Python (Simplified)

Here’s a minimal example using Hugging Face’s transformers library with a RAG model:

from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

# Load model and tokenizer
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-base")
retriever = RagRetriever.from_pretrained("facebook/rag-token-base", index_name="exact")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-base", retriever=retriever)

# Encode question
question = "Who is the president of Kenya in 2025?"
inputs = tokenizer(question, return_tensors="pt")

# Generate answer
outputs = model.generate(**inputs)
answer = tokenizer.batch_decode(outputs, skip_special_tokens=True)

print(answer)

This code:

Takes a question.
Retrieves relevant docs from a database.
Generates a natural language answer using the docs + the model.

Where is RAG Used?

RAG is not just theory—it’s already powering many real-world applications:

Chatbots & Virtual Assistants → They fetch accurate info from knowledge bases.
Customer Support → Agents use RAG systems to quickly answer FAQs from company docs.
Healthcare → Doctors can query medical databases for up-to-date insights.
Education → Students can ask questions, and the system cites textbooks or research papers.

Benefits of RAG

Keeps answers up to date
Reduces hallucinations
Can handle specialized knowledge (finance, healthcare, law)
More efficient than training a massive LLM from scratch

Challenges of RAG

Of course, RAG is not perfect:

Requires a well-organized knowledge base.
Retrieval quality matters—a bad retriever means bad answers.
More computationally expensive than using just a plain LLM.

Final Thoughts

RAG is a game-changer. Instead of forcing AI models to know everything, we let them fetch knowledge as needed. It’s like giving AI both memory (retrieval) and intelligence (generation). As someone currently learning Data Science and AI, I see RAG as one of the most practical bridges between machine learning theory and real-world applications.

If you’re diving into AI, understanding RAG will definitely give you an edge—not only in technical projects but also in appreciating how modern AI systems are evolving.

Balancing Type I and Type II Errors in Medical Decisions: A Kenyan Perspective

Loryne Joy Omwando — Fri, 29 Aug 2025 13:20:39 +0000

When we study statistics, we often hear about Type 1 and Type 2 errors. But in real life, especially in medicine, these errors are not just theoretical—they can literally mean the difference between life and death. Understanding where to trade off these errors is crucial for doctors, public health policymakers, and even patients making informed decisions.

Understanding Type 1 and Type 2 Errors

Type 1 Error (False Positive): This occurs when we conclude that something is true when it is actually false. In medical terms, it’s diagnosing a patient with a disease they don’t have.
Type 2 Error (False Negative): This happens when we fail to detect something that is true. Medically, it’s missing a diagnosis for a patient who actually has the disease.

The trade-off between these errors often feels like a tightrope walk. Reducing one can increase the other, and vice versa. So, how do we make this decision?

A Medical Scenario in Kenya: Malaria Testing

Imagine you are a clinician in Kisumu, where malaria is prevalent. You have a diagnostic test for malaria with 95% accuracy. Now, consider the implications of each error type:

Type 1 Error (False Positive): You diagnose malaria in someone who doesn’t have it. The patient might receive unnecessary antimalarial drugs, which can lead to side effects and contribute to drug resistance.
Type 2 Error (False Negative): You miss malaria in a patient who actually has it. This patient may not receive treatment in time, leading to severe complications or even death.

Clearly, in this scenario, Type 2 errors are more dangerous. Therefore, it’s safer to accept a slightly higher rate of Type 1 errors (false positives) to minimize Type 2 errors (false negatives).

Visualizing the Trade-Off in Python

We can simulate this trade-off using Python. Let’s assume we adjust the threshold of a diagnostic test and observe how Type 1 and Type 2 errors change.

import numpy as np
import matplotlib.pyplot as plt

# Simulated probabilities of disease
np.random.seed(42)
true_disease = np.random.binomial(1, 0.1, 1000)  # 10% prevalence

# Test sensitivity threshold
thresholds = np.linspace(0, 1, 100)
false_positives = []
false_negatives = []

for t in thresholds:
    predictions = np.random.rand(1000) < t
    fp = np.sum((predictions == 1) & (true_disease == 0)) / np.sum(true_disease == 0)
    fn = np.sum((predictions == 0) & (true_disease == 1)) / np.sum(true_disease == 1)
    false_positives.append(fp)
    false_negatives.append(fn)

plt.plot(thresholds, false_positives, label='Type 1 Error (FP)')
plt.plot(thresholds, false_negatives, label='Type 2 Error (FN)')
plt.xlabel('Decision Threshold')
plt.ylabel('Error Rate')
plt.title('Trade-Off Between Type 1 and Type 2 Errors')
plt.legend()
plt.show()

From the plot, we can visually pick a threshold where Type 2 errors are minimized, even if Type 1 errors increase slightly.

Making Decisions

The trade-off between Type 1 and Type 2 errors depends on context and consequences. In medical diagnostics, the severity of missing a disease often outweighs the inconvenience of a false alarm. In our malaria example, it is reasonable to tolerate some false positives to avoid missing actual malaria cases.

In other contexts, such as a drug side effect study, you might want to minimize Type 1 errors to prevent falsely claiming a drug is harmful when it isn’t. The key is to carefully weigh the risks and consequences before deciding on the acceptable balance.

Understanding Type 1 and Type 2 errors is not just an academic exercise. It’s a vital part of making informed decisions, especially in healthcare. In Kenya, where resources and access to medical care vary, making the right trade-off can save lives.

Understanding Supervised Learning: A Deep Dive into Classification

Loryne Joy Omwando — Fri, 22 Aug 2025 06:21:29 +0000

Machine Learning has always sounded like something challenging and technical . But the more I study it, the more I realize it’s simply about teaching computers to learn from data. One of the most important branches of Machine Learning I’ve been exploring lately is Supervised Learning—and in this post, I want to focus specifically on classification, sharing what I’ve learned so far, the models I’ve used, and some of the challenges I’ve faced as a student diving into this fascinating world.

What is Supervised Learning?

As a former teacher, I have realised that supervised Learning is like teaching a child using flashcards. You show them an apple, tell them “this is an apple,” and do the same with oranges, bananas, and so on. Over time, they start recognizing fruits on their own.

In the same way, supervised learning uses labeled data—meaning the input data already comes with the correct answers (labels). The algorithm studies this relationship and later predicts labels for unseen data.

For example:

If we feed a model patient data (like age, blood pressure, sugar levels) with labels (“diabetic” or “not diabetic”), the model learns to classify new patients into these categories.

How Classification Works

Classification is all about sorting things into groups. The data has features (inputs), and the task is to predict which class (output) each data point belongs to.

Here’s the step-by-step way I think about it:

Collect and label data – You need a dataset where the right answers (classes) are already known.
Train the model – Feed this data to an algorithm so it can learn the relationship between features and labels.
Test the model – Check how well it predicts on unseen data.
Deploy – Use it to make real-world decisions.

A simple example from daily life: Gmail classifying emails into Spam or Not Spam. That’s binary classification. More complex tasks, like classifying animals into cats, dogs, or birds, are called multi-class classification.

Types of Classification

When we first started learning about classification in class, I found it really helpful to understand that classification itself has different types:

Binary Classification – Only two classes (e.g., spam vs not spam).
Multi-Class Classification – More than two classes, but each data point belongs to just one (e.g. classifying fruits into apple, banana, or mango).
Multi-Label Classification – Each data point can belong to multiple categories at once (e.g. tagging a photo as “beach,” “sunset” and “friends”).

Understanding these types cleared up a lot of confusion for me when I was getting started!

Common Models Used for Classification

While exploring classification, I came across several algorithms, each with its own strengths and weaknesses:

Logistic Regression – Despite its name, it’s actually used for classification.
Decision Trees – Easy to understand and interprete.
Random Forests – A collection of decision trees working together .
**Support Vector Machines (SVMs) – Great at finding boundaries, but can be heavy.
k-Nearest Neighbors (kNN) – Looks at “neighbors” to decide the class.
**Neural Networks – Super powerful for complex tasks like image and speech.

💻 A Simple Python Example

When I was first learning classification, writing the code felt overwhelming. But once I discovered scikit-learn, things clicked. Here’s a simple example using Logistic Regression on the Iris dataset:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict & evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Seeing the accuracy printed out for the first time was a fulfilling moment for me —the computer had actually “learned” from data!

My Personal Insights

As a student in Machine Learning, I find classification exciting because it’s so close to real life. Every day, we make classifications in our minds—deciding if a matatu is too full , whether a mango is ripe, or even whether it will rain judging from the sky.

One big lesson: no algorithm is a silver platter. Sometimes a simple logistic regression beats a fancy neural network, depending on the dataset.

Challenges I’ve Faced

It hasn’t been smooth sailing. Here are some of the struggles I’ve encountered while working with classification:

Understanding the Types of Supervised Learning: Differentiating between classification and regression was tough at first. Writing their Python code also felt intimidating because I didn’t know which libraries to use.
Data Quality: Missing values, duplicates, or wrong labels can ruin everything .
Overfitting: My decision trees once performed perfectly on training data but terribly on test data.
Computational Resources: Neural networks are amazing, but without a good GPU, they can be painfully slow, thus I have to use google colab.

Conclusion

Supervised learning, and classification in particular, has given me a new appreciation of how data can drive intelligent decisions. From simple logistic regression to powerful neural networks, the journey of trying, failing, debugging, and improving has taught me not just technical skills but also patience.

I’m still learning, but one thing is clear: classification is not just about algorithms—it’s about asking the right questions, preparing the right data, and interpreting results responsibly.

**I’d love to hear your thoughts in the comments under this article!

⚽ Predicting 2024/25 Premier League Win Probabilities Using Python

Loryne Joy Omwando — Wed, 30 Jul 2025 21:56:36 +0000

In this project, I explored how to predict the probability of Premier League teams winning games in the 2024/25 season, using their 2023/24 results as a baseline. I used Python, the API-Football API, and some light statistics to model each team's win probability.

Let’s break it down 👇

📊 The Goal

Predict how many games each team is likely to win in the 2024/25 season using:

🧮 Bernoulli Distribution (win or no win)
🎲 Binomial Probability Model
📈 Visualizations with Seaborn & Matplotlib

⚙️ Tools & Libraries

import requests
import pandas as pd
from scipy.stats import binom
import matplotlib.pyplot as plt
import seaborn as sns

📦 Step 1: Pull 2023/24 Match Data from API-Football

I used the API-Football service to get Premier League match data for the 2023/24 season:

API_KEY = 'your_api_key'
BASE_URL = 'https://v3.football.api-sports.io'
HEADERS = {'x-apisports-key': API_KEY}

params = {'league': 39, 'season': 2023}
response = requests.get(f'{BASE_URL}/fixtures', headers=HEADERS, params=params)
fixtures = response.json()['response']

🔍 Step 2: Process the Results

Each match was inspected to determine the winning team, and I counted how many matches each team played and won.

data = []
for match in fixtures:
    if match['fixture']['status']['short'] == 'FT':
        home_team = match['teams']['home']['name']
        away_team = match['teams']['away']['name']
        home_goals = match['goals']['home']
        away_goals = match['goals']['away']

        if home_goals > away_goals:
            winner = home_team
        elif away_goals > home_goals:
            winner = away_team
        else:
            winner = None  # draw

        data.append({'home': home_team, 'away': away_team, 'winner': winner})

df = pd.DataFrame(data)

📊 Step 3: Calculate Win Probabilities

I grouped matches by team, calculated win rates (wins / games), and used the Binomial PMF to estimate their chance of winning a given number of games in 38-match season.

teams = list(set(df['home']).union(set(df['away'])))
records = []

for team in teams:
    played = df[(df['home'] == team) | (df['away'] == team)]
    wins = (df['winner'] == team).sum()
    win_rate = wins / played.shape[0]
    records.append({'team': team, 'wins': wins, 'played': played.shape[0], 'win_rate': win_rate})

df_stats = pd.DataFrame(records)

📈 Step 4: Visualize the Prediction

I used Seaborn to create a line plot for each team showing the probability distribution of their possible wins in the next season (assuming 38 games).

season_games = 38
plot_data = []

for _, row in df_stats.iterrows():
    team = row['team']
    p = row['win_rate']

    for x in range(0, season_games + 1):
        prob = binom.pmf(x, n=season_games, p=p)
        plot_data.append({'team': team, 'wins': x, 'probability': prob})

viz_df = pd.DataFrame(plot_data)

plt.figure(figsize=(14, 8))
sns.lineplot(data=viz_df, x='wins', y='probability', hue='team')
plt.title('Predicted Win Probability Distribution (2024/25 Season)')
plt.xlabel('Number of Wins')
plt.ylabel('Probability')
plt.tight_layout()
plt.show()

🧠 Why This Matters

This approach doesn’t predict exact results, but gives a solid probability profile for each team.
It’s helpful for analysts and fans to understand team performance trends.
One can improve this model by adding player-level data, home/away effects, injuries, or transfer impact.

💭 Final Thoughts

This was a fun exploration that blended sports and data science. Using historical data with probability theory gives deeper insights than just "gut feeling."

📂 GitHub: [https://github.com/loryneJoy/Python-Assignments.git]

🐍 Tags: #football #python #data-science #premier-league

📊 The Measures of Central Tendency and Why They Matter in Data Science

Loryne Joy Omwando — Sun, 20 Jul 2025 19:54:44 +0000

Have you ever stared at a dataset and wondered: “Okay... but what does all this really mean?”
Welcome to the world of central tendency—your first step in summarizing data and making it speak.

Whether you're a growing data analyst or a seasoned data scientist, understanding the core of yone's data starts here.

🧠 What Are Measures of Central Tendency?

In simple terms, measures of central tendency help us find the middle point or typical value in a dataset. These measures include:

Mean – the average
Median – the middle value
Mode – the most frequent value

Think of them like lenses: each one shows the data in a slightly different way.

🧺 Why Are They Important in Data Science?

Raw data can be messy, overwhelming, and misleading without context.

When working with data, especially during exploratory data analysis (EDA), these measures help us:

Summarize large datasets with a single number
Detect outliers and understand their impact
Choose appropriate models (some ML algorithms assume normal distribution)
Communicate insights clearly to stakeholders who aren’t tech-savvy

Here are some practical examples 👇

📌 The Mean – "The Classic Average"

import numpy as np

salaries = [40000, 45000, 50000, 52000, 60000]
mean_salary = np.mean(salaries)
print(f"The average salary is: ${mean_salary:.2f}")

💡 But beware! The mean is sensitive to outliers.

What happens if we introduce a wildly high salary?

salaries.append(200000)  # Big CEO bonus!
mean_salary = np.mean(salaries)
print(f"New average salary: ${mean_salary:.2f}")

The average gets pulled up, even though most employees earn much less.

📌 The Median – "The Middle Ground"

median_salary = np.median(salaries)
print(f"The median salary is: ${median_salary:.2f}")

The median resists outliers, making it a better choice when the data is skewed.

👈 For example, in real estate prices, income levels, or housing rent, the median gives a fairer picture.

📌 The Mode – "The Most Popular Kid"

from statistics import mode

grades = [85, 90, 88, 85, 92, 85, 90]
most_common_grade = mode(grades)
print(f"The most common grade is: {most_common_grade}")

The mode is especially useful for categorical data, like:

Most purchased product
Favorite programming language
Most common diagnosis in a hospital dataset

📉 When to Use Which?

Measure	Best For	Avoid When
Mean	Symmetric distributions	Data has outliers
Median	Skewed data or outliers	Uniform distributions
Mode	Categorical data	Continuous variables with few or no repeats

🔍 Real-Life Use Case: House Prices

Imagine you’re analyzing house prices in Nairobi:

house_prices = [1_000_000, 1_200_000, 1_300_000, 10_000_000]  # 👀 big outlier!

print("Mean:", np.mean(house_prices))
print("Median:", np.median(house_prices))

Which one would you trust more to describe a "typical" house price?
Definitely the median—because that luxury mansion isn't your average listing.

🧠 Final Thoughts

Mastering central tendency is more than just memorizing formulas.

It’s about knowing which tool to use, when to use it, and why. Data Science isn't just about models and code—it's about context and communication.

So next if handed a CSV file full of numbers, don’t panic. It's important to:

Start with the basics.
Start with central tendency.

✅ TL;DR

Mean = average (useful, but sensitive to outliers)
Median = middle value (great for skewed data)
Mode = most frequent value (perfect for categories)
Use them in EDA, data summaries, and to build intuition

Thanks for reading! 🙌
If you found this helpful, let’s connect or discuss below:
What’s your go-to measure when you explore new data?

🏰 Dev.to Metadata

Tags:

data-science
python
statistics
beginners
eda
machine-learning

How I Built an RCPA Prescription Performance Dashboard in Power BI

Loryne Joy Omwando — Thu, 10 Jul 2025 14:43:40 +0000

Recently, I completed a rewarding Power BI project that involved transforming raw Retail Chemist Prescription Audit (RCPA) data into an interactive dashboard that provides deep business insights. The challenge wasn't just in visualizing the data, but in cleaning, transforming, modeling, and telling a data-driven story that stakeholders could act upon.

In this article, I’ll walk you through how I tackled the project from start to finish, including:

ETL in Power Query
Data modeling and relationships
Key DAX measures
Designing visuals for insights

🗂️ Project Overview

Goal: Create a dynamic Power BI dashboard to analyze prescription performance by doctor, brand, region, and medical rep, and to understand doctor conversion and brand competition trends.

Key Objectives:

Clean and transform raw RCPA data
Build a structured data model with relationships
Generate insightful visuals using DAX and Power BI visuals
Help business users track brand performance and doctor behavior

📦 Dataset Summary

The dataset included four main tables:

RCPA Reporting Form: Raw data on doctor prescriptions
Product Master: Product and brand metadata
Brand Targets: Expected prescription targets
Expected Transformation Sheet: Data transformation guide

🧼 Step 1: ETL with Power Query

Using Power Query Editor, I cleaned and transformed the raw datasets into analytics-ready tables:

🔹 Cleaning Tasks:

Removed duplicates and missing values
Converted text-based numbers (e.g., "KSh 1,000") to numeric format
Standardized column names and data types

🔹 Transformation Tasks:

Merged Product Master with RCPA Reporting Form to enrich product info
Created RCPA Data Table with relevant metrics (Brand, Doctor, Med Rep)
Created Competitor RCPA Data Table for competitor comparisons
Aggregated prescription counts and values as needed

This step ensured clean, structured data that could be used reliably in the data model and visuals.

🧠 Step 2: Building the Data Model

I designed a star schema where:

Fact tables: RCPA Data and Competitor RCPA Data
Dimension tables: Product Master and Brand Targets

🔁 Relationships Created:

Product Master ➝ RCPA Data (based on product/brand)
Brand Targets ➝ RCPA Data (to compare actual vs. target Rx)
Product Master ➝ Competitor RCPA Data (for brand competition)

All relationships were tested and configured with correct cardinality and filter directions.

📈 Step 3: Visualizing Insights

With the model in place, I designed a clean and interactive dashboard in Power BI, which included:

🎯 Visuals Built:

1. Doctor Prescription (Rx) Performance

Bar/Column charts to show prescription volume per doctor vs. brand targets
Filterable by Region and Medical Rep

2. Doctor Conversion Status

Used DAX to calculate if a doctor met or exceeded target prescriptions for 3+ consecutive RCPA periods
Displayed with icons and color-coded status indicators

3. Brand Competition

Stacked column charts comparing our brand’s performance against competitors
Segmented by region and product category

Find link to project repository 👉 here

🔢 Key DAX Measures

Some example DAX measures used:


dax
Total Rx = SUM('RCPA Data'[Prescription Quantity])

Target Met = 
IF(
    'RCPA Data'[Total Rx] >= 'Brand Targets'[Target Qty], 
    "Yes", 
    "No"
)

Doctor Conversion = 
// Custom logic to track 3 consecutive periods (simplified here)

---

HOW EXCEL IS USED IN REAL-WORLD DATA ANALYSIS

Loryne Joy Omwando — Mon, 09 Jun 2025 18:51:48 +0000

INTRODUCTION TO EXCEL: A Data Analyst’s Multi-Purpose Tool

My perception of Excel changed when I enrolled in a Data Analytics course at LuxDevHQ. Earlier on, I perceived Excel as tool to make basic calculation, create lists, budgets and schedules. But by interacting with it, I have come to learn that Excel is so much more than rows and columns; it's about generating actionable insights and making informed decisions.

Excel is a data analysis tool essential and effective in the analysis and visualization of data. Despite there being other data analysis tools like Python, Tableau, SQL, excel remains the easiest to access and learn as it is always easily and readily accessible on the computer.

What is Data Analysis?
This is the process of examining, cleaning, transforming and modelling data (raw facts/information) to identify patterns/ trends that will play part in decision making.

What is Data?
Data refers to raw or unprocessed facts, that hasn’t been cleaned or analyzed.

EXCEL IN THE REAL WORLD

It is quite mind-blowing to know that Excel is quite popular and useful across the world, even in different sectors.
Financial Reporting and Budgeting: Isn’t it fascinating how excel is used by companies to analyze revenue in the year’s quarters to create detailed financial reports, track spending and forecast future spending.
Business Actions: Excel is used by businesses to monitor important metrics like financial results, customer trends, and sales performance. Users can find trends, compare performance over time, and pinpoint areas for improvement with the help of features like PivotTables, charts, formulas, and conditional formatting.
Marketing Performance Tracking: By utilizing Excel, businesses are able to identify the current trends in sales and monitor the changes that occur, and even calculate returns of investments.

USEFUL EXCEL FEATURES/ FORMULAS

1. VLOOKUP ():
Previously, I never knew of its existence. But now I know how useful it is in finding information in large sets of data.It is a function, known as a vertical lookup, that searches for a value in a column in a table, and returns a value in the same row, from a specific column. To easily understand it's syntax, it looks something like this: =VLOOKUP(lookup value, range containing the lookup value, the column number in the range containing the return value, Approximate match (TRUE) or Exact match (FALSE)).
For example, if column A has Item names and column B has item prices, to find price of a calculator in cell A2, apply the formula below to return the corresponding value.
=VLOOKUP("Calculator",A2:B10,2,FALSE)

2. Conditional Formatting:
This feature is useful in highlighting cells to follow rules set. For example, a rule can be made to only allow data in a particular cell based on roles or criteria. Also, you can highlight test scores less than 50%, by selecting the 'Highlight cell rules' > Less than> then choose a fill colour.

3. Index- Match:
This is a flexible function necessary in pulling information for example client phone number based on unique identifiers, especially when working with multiple sheets of data.

4. Pivot Tables:
This is a great and interesting feature in summarizing data. One can easily group a huge data list by item, region, cost, rating, etc, and thereafter, even manage to generate charts with the same information.

PERSONAL REFLECTION

Realizing that data isn't only for tech experts or statisticians is the biggest shift. Prior to learning and exploring Excel, I saw data as quite dry and disinteresting. However, I now see data, its processes of cleaning and analysis as an interestingly unfolding story, knowing that by learning Excel, I am constantly developing abilities that will enable me to convert raw data into useful business value. This gives me a sense of empowerment and encourages me to want to learn and explore more on Excel and data analysis in general.