Introduction
For a long time, I viewed programming as something reserved for software engineers and computer scientists. As someone with a background in scientific research and a growing interest in data analytics, I assumed tools like Excel, SQL, and Power BI were enough to answer most questions hidden in data.
Then I started learning Python, and what first looked like a programming language full of strange syntax quickly revealed itself as one of the most powerful tools a data analyst or data scientist can have. Python is not just about writing code; it is about automating repetitive tasks, cleaning messy datasets, analysing millions of records, and creating reproducible workflows that can be shared with anyone.
In this article, I share my beginner-friendly understanding of Python and how it is used in the data analytics space. If you are just starting your journey in data analysis, this guide will give you a practical overview of what Python is, why it matters, and the core concepts you need to know.
What Is Python?
Python is a high-level, general-purpose programming language known for its readability and simplicity.
It was created by Guido van Rossum and first released in 1991. The whole idea behind creating Python was that Code should be easy to read and easy to write.
Unlike many programming languages that require complex syntax, Python uses clear and concise statements that often resemble plain English.A better example is you can print hello data world and run to get the output.
That single line displays text on the screen and demonstrates how approachable Python can be.
Why Python Is Important in Data Analysis
Python has become one of the most widely used languages in data analytics, data science, machine learning, and artificial intelligence. Its strength lies in its versatility.
1. Automating Repetitive Tasks
Data analysts often perform the same operations repeatedly:
- Renaming hundreds of files
- Cleaning dozens of spreadsheets
- Downloading reports from APIs
- Merging datasets
Python can automate these tasks.Let me give you a real-world scenario: Imagine receiving 200 CSV files from different branches every month. Opening and cleaning each file manually in Excel would take hours. With Python, a short script can process all files in seconds.
Python script for automating file processing
2. Handling Large and Complex Data
Excel becomes slow when datasets grow to hundreds of thousands or millions of rows.
Python, especially with the pandas library, can efficiently process large datasets and perform advanced transformations. Real-World Scenario
Analysing e-commerce transactions from Jumia or Amazon with millions of records is practical in Python but cumbersome in spreadsheets.
3. Advanced Data Cleaning
Real-world data is rarely perfect.
You may encounter:
- Missing values
- Duplicate records
- Inconsistent text formats
- Incorrect dates
Python provides tools to clean and standardize data systematically. Real-World Scenario: Converting NAIROBI, Nairobi, and nairobi into consistent values is a simple operation in Python.
4. Reproducibility
Every step of your analysis is stored in code.
This means:
- Your work can be repeated
- Errors can be traced
- Colleagues can reproduce your results.
Python Basics Every Data Analyst Should Know
1.Variables
Variables store data values.
name = "Joseph"
age = 28
name is the variable that stores the name Joseph and age is the variable that store the 28
Think of variables as labeled containers.
2.Data Types
Python supports several built-in data types.
| Data_Type | Example |
|---|---|
| Strings | "Joseph |
| integer | 28 |
| Float | 23.43 |
| Boolean | True/False |
3.Operators
Operators allow you to perform calculations and comparisons.
Arithmetic Operators
Comparison Operators
Comparison operators are used to compare two values:
| Operator | Name | Example |
|---|---|---|
| == | Equal | x == y |
| != | Not equal | x != y |
| > | Greater than | x > y |
| < | Less than | x < y |
| >= | Greater than or equal to | x >= y |
| <= | Less than or equal to | x <= y |
Logical Operators
Logical operators are used to combine conditional statements:
| operator | Description | example |
|---|---|---|
| and | returns true if both conditions are true | x = 5, print(x<=5 and x < 10) (output:True) |
| or | returns true if one of the conditions is true | x = 5, print(x<4 and x < 10) (output:True) |
| not | Reverse the result, returns False if the result is true | x = 5,print(not(x > 3 and x < 10)) (output:False) |
4.Data Structures
Lists
Lists are used to store multiple items in a single variable.
List items are ordered, changeable, and allow duplicate values.
List items are indexed; the first item has index [0], the second item has index [1], etc.
list uses square brackets [ ]
fruits = ["apple", "banana", "mango"]
Tuples
Tuples are used to store multiple items in a single variable.
Tuple items are ordered, unchangeable, and allow duplicate values.
Tuples use parentheses ()
coordinates = (1.2, 3.4)
Dictionaries
Dictionaries are used to store data values in (key: value) pairs.
A dictionary is a collection that is ordered, changeable, and does not allow duplicates.
Dictionary uses curly brackets {}
student = {"name": "Amina", "score": 90}
Sets
A set is a collection that is unordered, unchangeable, and unindexed.
Sets are used to store multiple items in a single variable.
Sets cannot have two items with the same value.
Sets uses curly brackets {}
cities = {"Nairobi", "Mombasa", "Kisumu"}
These structures help organise and manipulate data efficiently.
Conditional Statements
marks = 75
if marks >= 70:
print("Pass")
else:
print("Fail")
For Loops
Loops repeat tasks automatically.
for number in range(1, 6):
print(number)
Real-World Scenario
Processing each row in a dataset or iterating through multiple files.
Functions
The functions package reusable logic.
def greet(name):
return f"Hello, {name}!"
Functions make code cleaner and easier to maintain.
Python Libraries for Data Analysis
One of Python's greatest strengths is its ecosystem of libraries.
Requests
requests url is used to interact with web APIs.
import requests
response = requests.get("https://dummyjson.com/products")
data = response.json()
This is useful for collecting real-time data from online sources.
Pandas
pandas url is the most widely used library for data manipulation.
import pandas as pd
#loading an excel file into a notebook
df = pd.read_csv("sales.csv")
df.head()
import pandas as pd
data_json = data.json()
#transforms a JSON file into a dataframe
df = pd.DataFrame(data_json[:100])
df
With pandas, you can:
- Load data
- Filter rows
- Handle missing values
- Group and summarize
- Merge datasets
For more information about pandas, refer to this video.
youtube link
Python Enhancement Proposals (PEP 8)
Like people, Python has its own likes and dislikes, its own "pet peeves". It likes clean indentation, meaningful variable names, and consistent formatting, and it dislikes messy spacing, unclear names, and poorly organized code. To help programmers understand what Python “prefers” and what it “dislikes,” the Python community created Python Enhancement Proposals (PEPs), with PEP 8 PEP providing the most widely used guidelines for writing readable and consistent code.
Indentation
Python uses indentation (typically 4 spaces) to define code blocks.
if True:
print("Indented correctly")
Line Length
Recommended maximum line length is 79 characters.
Naming Conventions
Variables and Functions: snake_case
total_sales = 500
def calculate_average():
pass
Classes: PascalCase
class StudentRecord:
pass
Constants: UPPER_CASE
PI = 3.14159
Docstrings
Docstrings describe what a function does.
def add_numbers(a, b):
"""Return the sum of two numbers."""
return a + b
Docstrings are essential for writing maintainable code.
Final Thoughts
Python has shown me that data analysis is not just about creating charts or writing queries; it is about building repeatable processes that turn raw data into reliable insights.
Although I am still at the beginning of my learning journey, I can already see why Python has become such an essential tool for analysts and scientists. If you are starting out, focus on the fundamentals, practice consistently, and trust that each small script you write is another step toward becoming a more effective data professional.
A few weeks into this journey, I already understand why Python is considered the backbone of data science.
And this is only the beginning!






Top comments (0)