DEV Community

Cover image for Day 37 of improving my Data Science skills
Sylvester Promise
Sylvester Promise

Posted on

Day 37 of improving my Data Science skills

One reason your data insights don't land (even when the analysis is correct)
A small but frustrating struggle I keep seeing among data users, analysts, founders, and hiring managers, is this: "The data is right, but the output is confusing, misleading, or unusable"
This usually shows up in two places:
1️⃣ Data visualizations that look fine… but don't communicate
I've been learning data visualization with Matplotlib, and one key lesson stood out: good charts are not about aesthetics, they're about accessibility and decision clarity.
Some practical fixes I learned:
Scatterplots are powerful for relationships, but when there's a third variable, encoding it with color (c=) instantly adds context.

Scatterplot with third variable c
Dark backgrounds may look cool, but they reduce readability when shared.
If color matters, use colorblind-friendly styles like:
✔️seaborn-colorblind
✔️tableau-colorblind10
These preserve meaning for everyone, not just people with perfect color vision.

If your work might be printed:
✔️Use less ink
✔️Consider grayscale styles for black-and-white printers

Effective method for visualization
A visualization that excludes part of your audience is a broken visualization.

I wrapped up the Intermediate Importing Data in Python course today, including exercises that involved scraping data via the Twitter API. That felt like closing a chapter: pulling data from real systems, understanding authentication, and working with messy, real-world responses instead of clean examples.

Statement of Accomplishment
But instead of moving on, I did something intentional: I enrolled in a new track, Importing & Cleaning Data in Python, and immediately started the Cleaning Data in Python course, which led to point 2️⃣
2️⃣"Clean" data that isn't actually clean
When cleaning data, type constraints quietly decide what analyses are even possible. Sometimes the problem isn't the model, the visualization, or the question, it's that the data is pretending to be something it's not.

Common data types
Numbers that should be categories.
Categories treated like numbers.
Decisions made on assumptions no one stopped to question.
What looks like a number doesn't always behave like one.
For example: Codes, ratings, categories stored as numbers can break analysis if you don't enforce data type constraints. In this case, Python allows you to store them properly as 'category'.
Changing numeric data to category
Learning this made one thing clear: Cleaning data is about protecting meaning.
This is especially important when importing:
HDF5 files (hierarchical, complex structures)
MATLAB (.mat) files using scipy.io
Data pulled from APIs (like Twitter), where structure doesn't equate quality

Importing HDF5 files
Why this matters for decision-makers and hiring managers
Anyone can load data. Anyone can plot a chart. Anyone can scrape an API.
But not everyone:
Preserves meaning during import
Enforces correct data types
Designs visuals that work for real humans
Thinks about how insights will be consumed, printed, or acted on

That gap is what silently kills trust in data. If you have ever stared at a chart or dataset thinking "Why doesn't this sit right?", this might be why.

The work today reminded me that good data work isn't louder or fancier. It's quieter. More intentional. More honest.

That's the kind of data work I want to be known for!

-SP

Top comments (0)