DEV Community

info_brust
info_brust

Posted on

Why I’m Learning Data Analysis to Get Better at Machine Learning

Hey everyone! 👋

I’m currently on a journey to become a machine learning developer — but not in the way you might expect.

Before diving deep into neural networks, models, and algorithms, I decided to take a detour and focus on data analysis. And I think it’s one of the best decisions I’ve made.

In this article, I want to share:

  • Why I chose to learn data analysis first
  • What I’m learning
  • How it's making me better at machine learning
  • What tools and projects I’m using along the way

🧠 Why Data Analysis?

It’s simple: machine learning is nothing without data.

Sure, models are exciting. But the truth is, even the most advanced ML algorithms won’t work if your data is messy, incomplete, or irrelevant. And that’s where data analysis comes in.

Here’s what I realized:

  • Data analysis teaches you to understand and explore data.
  • It helps you clean and prepare data for machine learning.
  • It trains you to ask the right questions before building a model.
  • It prevents "garbage in, garbage out" syndrome.

So instead of just learning algorithms and training models, I wanted to first learn how to handle and understand data like a pro.


🛠️ What I’m Learning

Here’s my current learning stack and workflow:

1. Python for Data Analysis

I already knew Python, so I focused on:

  • pandas – for manipulating data
  • numpy – for numerical operations
  • matplotlib & seaborn – for data visualization
  • plotly – for interactive charts

2. SQL Basics

Since a lot of real-world data lives in databases, I’m also learning:

  • Basic SELECT statements
  • GROUP BY, ORDER BY, JOINs
  • Filtering, aggregating, subqueries

I’m using platforms like:

3. Exploratory Data Analysis (EDA)

I practice exploring datasets by:

  • Finding missing values
  • Detecting outliers
  • Understanding distributions
  • Looking for patterns and correlations

📊 Projects I'm Working On

Learning by doing is the best way. Here are some small projects I’ve started:

🚀 Netflix Dataset (from Kaggle)

  • Questions: Which countries have the most Netflix shows? What genres are most common?
  • Tools: pandas, seaborn

🌍 COVID-19 Data

  • Goal: Analyze the trend of cases and deaths over time by country
  • Tools: plotly for interactive visualizations

🚢 Titanic Dataset (classic!)

  • Goal: Perform detailed EDA to understand which passengers survived and why
  • Not building a model yet — just exploring!

🧩 How This Helps Me in Machine Learning

Here’s what data analysis has unlocked for me:

  • Better Feature Engineering
    I now know how to extract meaningful features from raw data.

  • Data Cleaning Superpowers
    I can spot issues before they wreck my model’s performance.

  • Model Understanding
    I can explain results better because I understand the data behind them.

  • More Confidence
    I feel more prepared to enter machine learning with a strong foundation.


💡 Final Thoughts

Machine learning is exciting, but it doesn’t start with models. It starts with data.

If you’re also learning ML, I highly recommend starting with data analysis. You’ll:

  • Save time
  • Build stronger models
  • Understand real-world problems better

This is just the beginning of my journey. I plan to share more of my progress and projects soon!

Thanks for reading 🙌
Let’s connect — drop your thoughts, suggestions, or your own experience below!

Top comments (0)