Alex

Posted on Mar 11

From Zero to Data Science: Start Doing Real Analysis with Python Right Now

#python #datascience #beginners #guide

Introduction

Hey, I’m Alex, and I'll start by saying I love Python. Why? Because it makes data science ridiculously easy to get into. You don’t need years of coding experience or a fancy degree—just a few key tools, and you’re already doing real data analysis.

Think of Python like a Swiss Army knife that has various "tools" (libraries) built for different tasks, and you only need a few to start slicing through real-world data. In this article, I’ll show you how far just these three libraries—Pandas, NumPy, and Matplotlib—can take you. If you’ve never written a line of code, don’t worry. I’ll guide you through every step, from setting up your first coding environment to analyzing and visualizing data.

By the end, you’ll have written your first real data science script, and I’ll even give you a challenge to test your skills. Let’s go!

What is a Library? (And Why Should You Care?)

If you've never heard of libraries before, think of them as pre-built toolkits that save you time. Imagine you want to build a house. You could cut every piece of wood and make every single nail from scratch… or you could just grab a hammer and some pre-cut planks from a store and get started.

That's what libraries do for coding. Instead of writing complex programs from scratch, you can "import" a library and use the pre-made tools to make your life easier. Python has thousands of libraries, but today, we’re using just three that will take you surprisingly far in data science.

Setting Up Your Coding Environment

Before we write any code, you need a place to actually run it. Let’s set up your coding environment.

Step 1: Install Python

If you don’t have Python yet, download and install it from python.org. Make sure to check the box that says “Add Python to PATH” during installation.

Step 2: Install Jupyter Notebook

Jupyter Notebook makes running Python code easy, especially for data science. Open your terminal (or command prompt) and type:

pip install jupyter numpy pandas matplotlib

Once installed, launch Jupyter Notebook by running:

jupyter notebook

This will open a browser window where you can write and run Python code in an interactive way.

(If you prefer VS Code, check out this guide on setting it up with Jupyter Notebook.)

Your First Data Science Project in Python

Now that you're set up, let’s do some real data science. We'll create a dataset, analyze it, clean it, and visualize it—all with just three libraries.

Step 1: Create a Simple Dataset

Instead of downloading a dataset, we’ll generate one using NumPy and Pandas. This will show you how much you can do with just these libraries.

import numpy as np
import pandas as pd

# Creating a simple dataset with 100 rows
data = {
    'Age': np.random.randint(20, 60, 100),
    'Salary': np.random.randint(30000, 120000, 100),
    'City': np.random.choice(['New York', 'London', 'Tokyo'], 100)
}

df = pd.DataFrame(data)

# Display first 5 rows
print(df.head())

🔹 What just happened?

We created a dummy dataset with 100 people, each having an age, salary, and city. The dataset looks just like an Excel table but is now a DataFrame, which is Pandas' way of handling structured data.

Step 2: Get an Overview of the Data

Before doing anything, always check what your data looks like:

print(df.info())   # Shows column names, data types, and missing values
print(df.describe())  # Summary statistics
print(df.shape)  # Number of rows and columns

Step 3: Clean the Data

In the real world, data is never perfect—it often has gaps, errors, or inconsistencies. In Data Science, we call this "dirty data", and it usually means missing values, incorrect formats, or duplicates.

Since we created our dataset artificially, it’s already clean. But let’s simulate a common data cleaning process by checking for missing values and removing them if necessary:

print(df.isnull().sum())  # Check for missing values
df_clean = df.dropna()  # Remove rows with missing values

💡 Why does this matter?

Imagine running an analysis on customer purchases, but half of the purchase amounts are missing. Any insights you get would be misleading. Cleaning data ensures you're working with accurate information!

Step 4: Explore the Data (EDA)

Now let’s dig deeper and find something interesting. For example, what’s the average salary in each city? This is the code use for that:

print(df.groupby('City')['Salary'].mean())

Step 5: Visualizing the Data

Numbers are cool, but charts make insights obvious. Let’s plot a bar chart to compare salaries across cities:

import matplotlib.pyplot as plt

df.groupby('City')['Salary'].mean().plot(kind='bar')

plt.title('Average Salary by City')
plt.xlabel('City')
plt.ylabel('Salary')
plt.show()

🔥 Boom! You just built your first real data science analysis! 🎉

This is a simplified version of what top companies hire data scientists to do. The only difference? Experience and practice.

Keep practicing, keep exploring, and soon enough, you'll be good enough to get hired. Every expert started right where you are now. Stay consistent, and you’ll get there faster than you think. 🚀

Challenges to Try on Your Own

Now it’s your turn. Try these challenges using the dataset we created (google it if needed, that's how you learn and make it your own):

Find the average age in each city
Create a scatter plot showing the relationship between Age and Salary
Filter out people earning less than $50,000 and visualize the results

If you complete these, congrats—you’re already doing real data science!

The Most Important Skill in Data Science: Asking Questions

Whether you’re a beginner or an expert, you will always be asking questions. From simple things like “How do I install Pandas?” to deep topics like “How do I optimize machine learning models?”—the key is to never feel bad about asking.

Embrace Googling and searching for answers. Even top data scientists do it every single day.

If you ever get stuck, just type your question into Google, Stack Overflow, or a Python documentation site. 99.99% of the time, someone else has already asked it!

Conclusion

We started with zero experience, set up a coding environment, created a dataset, analyzed it, cleaned it, and visualized insights. And we did all of that with just three Python libraries.

Data science isn’t about memorizing everything—it’s about getting started and learning as you go. Keep practicing, keep asking questions, and you’ll be amazed at how fast you improve.

Want to see more tech stuff like this? Connect with me on LinkedIn or check out my GitHub where I share Python and data science projects I get up to!

DEV Community