DEV Community: Wanjiru-Njoroge

Understanding Type I and Type II Errors in a Medical Setting

Wanjiru-Njoroge — Tue, 12 Aug 2025 16:55:50 +0000

When it comes to healthcare, accuracy is critical. We rely on doctors and medical tests to give us the right answers about our health. Unfortunately, mistakes do happen. In statistics , these mistakes are often described as Type I errors (false positives) and Type II errors (false negatives). Understanding these errors can help patients and healthcare providers make better decisions and improve testing accuracy.

The Four Possible Outcomes of a Medical Test

In a hospital scenario, there are four possible outcomes when a patient is tested for a disease:

You are sick, and the test correctly identifies you as sick.
You are healthy, and the test correctly shows you are healthy.
You are sick, but the test says you are healthy.
You are healthy, but the test says you are sick.

The first two outcomes are correct diagnoses, while the last two represent errors in medical testing.

Hypothesis Testing in Healthcare

Before interpreting any medical test, doctors often think in terms of statistical hypotheses:

Null Hypothesis (H₀): The patient does not have the disease.
Alternative Hypothesis (H₁): The patient does have the disease.

After conducting the test (blood test, an imaging scan, or a lab culture) the results determine whether we reject or accept the null hypothesis.

Type I Error (False Positive)

A Type I error occurs when the null hypothesis is rejected even though it is true.

In medical terms, this means the test shows you have the disease when you actually do not have one

Example: Cancer Screening

If a cancer test incorrectly says you have cancer, it can cause:

Emotional distress for the patient and family.
Unnecessary follow-up tests or treatments.
Potential side effects from treatments that were never needed.
Financial costs for both the patient and the healthcare system.

Possible causes: Test contamination, human error, or low test specificity.

Type II Error (False Negative)

A Type II error happens when the null hypothesis is accepted even though it is false.

In medical terms, this means the test shows you are healthy when you actually have the disease a false negative.

Example: Cancer Screening

If the test misses cancer, it can lead to:

Missed opportunity for early treatment.
Disease progression, possibly worsening the prognosis.
Increased long-term healthcare costs due to late-stage treatment.

Possible causes: Low test sensitivity, poor sample quality, or inadequate detection technology.

Which Is “Better”?

For me, a false negative feels better than a false positive because at least I wouldn’t have to deal with the anxiety and financial strain of unnecessary treatment. However, I also recognize that the risks of delayed diagnosis are far too high to ignore. That’s why I believe the ultimate goal in healthcare should always be to improve accuracy, reduce error rates, and ensure that patients receive timely and correct diagnoses.

Understanding Measures of Central Tendency and Their Importance in Data Science

Wanjiru-Njoroge — Tue, 22 Jul 2025 20:09:16 +0000

In the dynamic field of data science, interpreting data is fundamental to success. Whether predicting customer behavior, analyzing sales performance, or designing machine learning algorithms, understanding your data's characteristics is crucial. One of the primary tools for summarizing data is measures of central tendency.

What Are Measures of Central Tendency?

Measures of central tendency are statistical techniques used to determine the center or typical value of a dataset. They provide a single value that represents the entire distribution of data. The three main types are:

1. Mean (Average)

The mean is calculated by summing all values and dividing by the number of values. It is the most commonly used measure and is particularly effective for normally distributed data.

Mean = ∑x_i / n

2. Median

The median is the middle value when the data is arranged in order. If there is an even number of observations, it is the average of the two middle numbers. This measure is especially useful when the data contains outliers or is skewed.

3. Mode

The mode is the most frequently occurring value in a dataset. It is particularly valuable for categorical data and for identifying common trends.

Why Are They Important in Data Science?

1. Data Summarization

Measures of central tendency enable data scientists to quickly grasp the general pattern of a dataset. This is essential during Exploratory Data Analysis (EDA), which aims to familiarize analysts with the data before conducting deeper modeling or analysis.

2. Identifying Data Distribution

Understanding the mean, median, and mode helps determine if the data is normally distributed or skewed. This knowledge informs the selection of appropriate models and algorithms.

For normally distributed data:

mean ≈ median ≈ mode.
In skewed data:

The mean is influenced by the direction of the skew.

3. Outlier Detection

Significant discrepancies between the mean and median may indicate the presence of outliers. Identifying these is crucial, as outliers can distort models and predictions.

4. Modeling and Machine Learning

Many algorithms, particularly linear regression, assume data is normally distributed. Measures of central tendency help verify these assumptions and facilitate necessary data preprocessing.

5. Communication and Reporting

Stakeholders often prefer simplified insights. For example:

“The average sales this quarter were $5,000”

is much clearer than presenting raw data.

Real-Life Examples

Healthcare: In analyzing patient wait times, the median is often more meaningful than the mean, as outliers (e.g., emergencies) can skew the average.
Retail: The mode helps identify the most sold product in a store.
Finance: Investment analysts frequently report average returns but must also consider median returns to understand skewed distributions.

🔗 Understanding Relationships in Power BI

Wanjiru-Njoroge — Sun, 22 Jun 2025 13:58:20 +0000

👥 What Is a Relationship in Power BI?

A relationship informs Power BI how tables are connected, typically through a shared column such as CustomerID, ProductID, or Date.

Without these relationships, Power BI cannot match data between tables, leading to blank entries or inaccurate totals in your reports.

🔄 The 3 Types of Relationships

1. One-to-One (1:1)
Rare but useful when both tables contain unique values.
For example, the Employee table and the EmployeeDetails table.

2. One-to-Many (1:*)
The most common type of relationship.
For example, one product can have many sales.

3. Many-to-Many (:)

Challenging to manage.
For example, sales territories with multiple representatives in each region.

🛠️ How to Build a Relationship
Open Model view.
Drag a column from one table to the corresponding column in another table.
Select the appropriate cardinality (1:1, 1:*, etc.).
Choose a filter direction (typically single).
Now, the tables are connected.

⚠️ Factors That Break Relationships

Lack of unique values on one side (Power BI requires this).
Ambiguous filter paths.
Relationships established in the wrong direction.
Failing to create a relationship altogether 😅.

✅ Pro Tips
Maintain clean and unique lookup tables (Customers, Products, Dates).
Avoid circular relationships.
Always review your model diagram—it serves as your map 🗺️.
Test relationships using simple visuals, such as slicers.

Getting started with Excel for Data Analysis: What I have learned so far

Wanjiru-Njoroge — Tue, 10 Jun 2025 19:29:51 +0000

In the past, when I opened Excel, it just looked like an empty grid — rows, columns, cells, and tables. It felt dry and technical, like a digital notebook without much purpose.

But once I began learning data analysis, I quickly realized Excel is far more than a basic spreadsheet tool. It's one of the most accessible and powerful platforms for working with data — especially when you're just starting out.

So, What Is Excel Really?
At first glance, Excel might seem simple. But in reality, it's a blank canvas for all kinds of data work. Whether you're analyzing finances, organizing surveys, managing inventory, or tracking student performance, Excel allows you to store, structure, and explore your data — all in one place.

It’s part calculator, part notebook, part detective. And with just a few key skills, Excel becomes your go-to tool for understanding what your data is really saying.

💼 Real-World Uses of Excel in Data Analysis
Here are just a few ways Excel is used across different industries to support data-driven decisions:

📋 HR Analytics: Understanding People Through Data HR teams do more than just handle hiring paperwork. They use data to improve employee experiences and workplace performance.

Example:
An HR manager can use Excel to track employee attendance over time. With conditional formatting, they might highlight departments with the highest absence rates — helping leadership address burnout or workflow issues.

📦_ Inventory and Supply Chain_: Staying on Top of Stock In retail and logistics, inventory accuracy is everything. Excel helps businesses track stock levels, monitor supplier performance, and forecast restocking needs.

Example:
A small business owner might use Excel to log incoming shipments and daily sales. With a few formulas, they can easily see which products are running low and spot delays from suppliers — before it impacts the bottom line.

🏥 Healthcare Data: Improving Efficiency While Excel doesn’t replace hospital systems, it’s incredibly useful for managing non-sensitive healthcare data like appointments, staff schedules, or treatment plans.

Example:
A local clinic uses Excel to record weekly patient visits. By grouping the data by condition and visualizing trends, they can adjust staff availability and reduce wait times during peak hours.

🔍 Excel Features I’ve Come to Love
Learning Excel has also introduced me to a few powerful tools and formulas. These are some of my go-to favorites:

VLOOKUP: A lifesaver when combining datasets. I’ve used it to match product names with IDs or to connect survey results with demographic info.

Conditional Formatting: This makes patterns jump off the page — like highlighting overdue tasks, duplicate entries, or high scores.

IF Statements: Great for applying logic and flagging data — for example, labeling rows as “Pass” or “Fail” based on a score.

💭 How Excel Changed the Way I See Data
Before I learned Excel, data felt like noise — just numbers in rows and columns that didn’t mean much until someone else interpreted them.

But now? I see data as a conversation.

When I open a dataset, I instinctively ask:
Are there patterns? Outliers? Is something missing?
I use Excel to explore these questions — almost like interviewing the data. And in return, it gives me answers through charts, summaries, and calculated fields.

More importantly, Excel has made me more careful. I’ve learned to look twice before jumping to conclusions, and I’ve gained a deep appreciation for how a simple tool can surface powerful insights.