DEV Community

Timothy Njeru
Timothy Njeru

Posted on

Introduction to Python for Data Analytics

Introduction to Python for Data Analytics
Every data journey has a starting point. Mine began with a single question: "How do people actually make sense of all this data?" The answer, I quickly discovered, is Python.
I'm currently enrolled in the LuxDevHQ Intermediate Python Certification Course, and in this article I want to share what I've learned so far — not from a textbook perspective, but from the lens of someone actively going through the process. If you're just starting out with Python for data analytics, this article is for you.

Step 1: Setting Up the Environment — Choosing Your Tools
Before writing a single line of Python, I had to make a key decision: where do I actually write and run my code? This turned out to be more interesting than I expected, because there isn't just one answer.
Here's every tool I installed and what I learned about each:
🐍 Anaconda
Anaconda is a Python distribution built specifically for data science. When you install Anaconda, you don't just get Python — you get a whole ecosystem: pre-installed libraries, a package manager called conda, and access to tools like Jupyter Notebook and Spyder, all in one go.

Why I'd recommend it for beginners: You spend zero time fighting with installations. Everything just works.

📓 Jupyter Notebook
Jupyter Notebook is the tool I've come to love most. It lets you write Python code in cells — small blocks you can run independently — and see the output right below each cell. You can also mix code with text, images, and explanations, which makes it perfect for data analytics work and for documenting your learning.

Launch Jupyter Notebook from your terminal or Anaconda Navigator

jupyter notebook
It opens right in your browser and you're ready to go.
💻 VS Code (Visual Studio Code)
VS Code is a lightweight but powerful code editor from Microsoft. With the Python extension installed, it gives you features like syntax highlighting, code suggestions (IntelliSense), and an integrated terminal. It's great for writing longer Python scripts and building real projects.
🔬 Spyder
Spyder comes bundled with Anaconda and feels a lot like MATLAB or RStudio if you've used those before. It has a Variable Explorer panel that lets you inspect your variables in real time — incredibly useful when you're learning how Python stores and handles data.
☁️ Google Colab
Google Colab is a free, cloud-based Jupyter Notebook environment from Google. No installation needed — just a Google account and a browser. It also gives you free access to GPUs, which becomes important later when you get into machine learning.

My take: I use Jupyter or Colab for exploratory work and learning, and VS Code when writing more structured scripts. Having all these tools installed means I can choose the right one for the task at hand.

Step 2: Python Basics — Writing Your First Code
With the environment set up, the real learning began. Python's syntax is clean and readable, which makes it genuinely beginner-friendly. Here's what I covered in the basics.
The print() Function — Your First Output
print() is the most fundamental function in Python. It outputs text or values to the screen. Every Python learner starts here.
CODE
print("Hello, World!")
print("Welcome to Python for Data Analytics")
Output:
Hello, World!
Welcome to Python for Data Analytics
You can print numbers, the result of calculations, and even multiple items at once:
CODE
print("The answer is:", 42)
print(10 + 5) # Prints 15
print("Pi is roughly", 3.14159)
The input() Function — Getting Data from the User
While print() sends information out, input() brings information in. It pauses the program and waits for the user to type something.
python
name = input("What is your name? ")
print("Hello,", name + "! Welcome to data analytics.")
Example interaction:
What is your name? Brian
Hello, Brian! Welcome to data analytics.

Important note: input() always returns a string, even if the user types a number. To use it as a number, you need to convert it:

age = input("Enter your age: ")
age = int(age) # Convert string to integer
print("In 10 years, you will be", age + 10)
Combining print() and input() — A Simple Example
Here's a small program that puts both functions together:

print("=== Student Score Checker ===")
student_name = input("Enter student name: ")
score = float(input("Enter score: "))

if score >= 50:
print(student_name, "passed with a score of", score)
else:
print(student_name, "did not pass. Score:", score)

Step 3: Data Types — How Python Classifies Data
One of the first concepts that truly clicked for me was data types. In analytics, data comes in many forms — names, numbers, dates, true/false flags — and Python has a specific type for each.
Here are the core data types I've learned:

  1. Integer (int) — Whole Numbers

student_count = 45
year = 2025
temperature = -3

print(type(student_count)) #

  1. Float (float) — Decimal Numbers

gpa = 3.75
price = 1299.99
tax_rate = 0.16

print(type(gpa)) #

  1. String (str) — Text

name = "Alice Wanjiru"
course = "Python for Data Analytics"
city = 'Nairobi'

print(type(name)) #

Useful string operations

print(name.upper()) # ALICE WANJIRU
print(course.lower()) # python for data analytics
print(len(name)) # 13 (number of characters)
print(name.replace("Alice", "Bob")) # Bob Wanjiru

  1. Boolean (bool) — True or False Booleans are critical in data analytics for filtering and applying conditions.

is_enrolled = True
has_paid = False
passed_exam = True

print(type(is_enrolled)) #
print(is_enrolled and has_paid) # False
print(is_enrolled or has_paid) # True
Checking and Converting Data Types

value = "2025"
print(type(value)) #

value = int(value)
print(type(value)) #
print(value + 5) # 2030
Data TypeExampleUse in Analyticsint45, 2025Counts, IDs, yearsfloat3.14, 99.9Prices, scores, ratesstr"Nairobi"Names, categories, labelsboolTrue, FalseFlags, conditions, filters

Step 4: Data Structures — Organizing Collections of Data
Individual values are useful, but real-world data always comes in collections. Python has four main data structures for this, and understanding them is foundational for data analytics.

  1. List — Ordered, Changeable Collection Lists are the most commonly used data structure. They hold multiple items in order and can be changed after creation. python# A list of student scores scores = [85, 92, 78, 90, 88]

print(scores[0]) # 85 — first item (index starts at 0)
print(scores[-1]) # 88 — last item
print(scores[1:4]) # [92, 78, 90] — slicing
print(len(scores)) # 5

Modifying a list

scores.append(95) # Add to end
scores.remove(78) # Remove a value
scores.sort() # Sort ascending

print(scores) # [85, 88, 90, 92, 95]

Why it matters for data: Think of a list like a single column in a spreadsheet — an ordered series of values you can loop through, filter, and analyze.

  1. Tuple — Ordered, Unchangeable Collection Tuples look like lists but use parentheses () and cannot be modified after creation. Use them for data that should stay fixed. python# GPS coordinates — should never change nairobi_location = (-1.2921, 36.8219)

Student record that shouldn't be altered

student = ("Brian Omondi", "D001", "Data Analytics")

print(nairobi_location[0]) # -1.2921
print(student[1]) # D001

Trying to change a tuple raises an error

student[0] = "Alice" # ❌ TypeError

  1. Dictionary — Key-Value Pairs Dictionaries store data as key: value pairs — like a real-world dictionary where you look up a word (key) to find its meaning (value). This is one of the most powerful structures for analytics. python# A student record as a dictionary student = { "name": "Brian Omondi", "age": 24, "course": "Python for Data Analytics", "score": 88.5, "enrolled": True }

Accessing values

print(student["name"]) # Brian Omondi
print(student["score"]) # 88.5

Adding a new key

student["grade"] = "A"

Updating a value

student["score"] = 91.0

Looping through a dictionary

for key, value in student.items():
print(f"{key}: {value}")
Output:
name: Brian Omondi
age: 24
course: Python for Data Analytics
score: 91.0
enrolled: True
grade: A

  1. Set — Unique, Unordered Collection Sets automatically remove duplicate values — making them great for finding unique entries in a dataset. python# Finding unique departments in a dataset departments = {"Sales", "IT", "HR", "Sales", "IT", "Finance"} print(departments) # {'Finance', 'HR', 'IT', 'Sales'} — duplicates removed

Set operations — useful for data comparisons

students_math = {"Alice", "Bob", "Carol", "David"}
students_python = {"Bob", "Carol", "Eve", "Frank"}

print(students_math & students_python) # {'Bob', 'Carol'} — in both
print(students_math | students_python) # all students across both classes
print(students_math - students_python) # {'Alice', 'David'} — math only

Step 5: Operators — Performing Actions on Data
Operators are the symbols and keywords Python uses to perform actions — calculations, comparisons, and logic. I think of them as the verbs of Python.
Arithmetic Operators — Doing the Math

a = 20
b = 6

print(a + b) # 26 — Addition
print(a - b) # 14 — Subtraction
print(a * b) # 120 — Multiplication
print(a / b) # 3.3333... — Division (always returns float)
print(a // b) # 3 — Floor division (whole number only)
print(a % b) # 2 — Modulus (remainder)
print(a ** b) # 64000000 — Exponentiation
A practical example:

total_sales = 150000
num_months = 6
monthly_avg = total_sales / num_months
print(f"Average monthly sales: KES {monthly_avg:,.2f}")

Average monthly sales: KES 25,000.00

Comparison Operators — Asking Yes/No Questions
Comparison operators return True or False and are essential for filtering data.

score = 75

print(score > 50) # True — Greater than
print(score < 50) # False — Less than
print(score >= 75) # True — Greater than or equal to
print(score <= 60) # False — Less than or equal to
print(score == 75) # True — Equal to
print(score != 80) # True — Not equal to
In context — filtering a list:

scores = [72, 45, 88, 55, 91, 39]
passing = [s for s in scores if s >= 50]
print("Passing scores:", passing) # [72, 88, 55, 91]
Logical Operators — Combining Conditions

age = 22
gpa = 3.8

Both conditions must be true

eligible = age >= 18 and gpa >= 3.5
print("Eligible for scholarship:", eligible) # True

Either condition can be true

has_experience = False
has_degree = True
can_apply = has_experience or has_degree
print("Can apply:", can_apply) # True

Reverse a condition

is_complete = False
print("Still in progress:", not is_complete) # True
Assignment Operators — Shorthand Updates

total = 0
total += 500 # total = total + 500 → 500
total += 300 # → 800
total -= 100 # → 700
total *= 2 # → 1400
total //= 3 # → 466
print(total) # 466
Putting It All Together — A Mini Analytics Example
Here's a program that pulls together everything covered so far — data types, data structures, operators, print(), and input():

print("=== Student Performance Report ===\n")

students = [
{"name": "Alice", "score": 85},
{"name": "Bob", "score": 42},
{"name": "Carol", "score": 91},
{"name": "David", "score": 57},
{"name": "Eve", "score": 38},
]

total = 0
passed = 0
failed = 0

for student in students:
total += student["score"]
if student["score"] >= 50:
passed += 1
status = "PASS ✅"
else:
failed += 1
status = "FAIL ❌"
print(f"{student['name']}: {student['score']} — {status}")

average = total / len(students)
pass_rate = (passed / len(students)) * 100

print(f"\nTotal Students : {len(students)}")
print(f"Average Score : {average:.1f}")
print(f"Pass Rate : {pass_rate:.0f}%")
print(f"Passed : {passed}")
print(f"Failed : {failed}")
Output:
=== Student Performance Report ===

Alice: 85 — PASS ✅
Bob: 42 — FAIL ❌
Carol: 91 — PASS ✅
David: 57 — PASS ✅
Eve: 38 — FAIL ❌

Total Students : 5
Average Score : 62.6
Pass Rate : 60%
Passed : 3
Failed : 2
This is essentially a mini data analytics pipeline — built entirely from the fundamentals we've covered.

Key Takeaways So Far
Looking back at the journey so far, a few things stand out:

The environment matters. Having the right tools (Jupyter, VS Code, Colab, Spyder) makes learning smoother. Each tool has its own strengths.
Basics are not boring. print(), input(), data types, and operators look simple — but they are the building blocks that everything else in data analytics rests on.
Data structures are how you think about data. Lists, dictionaries, tuples, and sets aren't just Python features — they mirror how data is actually organized in the real world.
Operators are everywhere. Every filter, calculation, and condition in data work uses operators under the hood.

What Comes Next
This is just the foundation. The road ahead includes:

🔁 Control Flow — if/else, for loops, while loops for automating data processing
🔧 Functions — Writing reusable blocks of code
📦 Libraries — NumPy, Pandas, Matplotlib for real data analytics
🗄️ Working with Files — Reading CSVs, Excel files, and JSON data
📊 Data Visualization — Turning numbers into charts that tell stories

Conclusion
Python for data analytics isn't something you learn overnight — it's a skill you build layer by layer. Setting up the right tools, understanding how Python handles data through its types and structures, and learning how to operate on that data are the essential first layers.
If you're starting out like I am, my biggest advice is: don't rush past the basics. The print() function, the humble list, the simple + operator — these things show up in every data script, every analytics notebook, and every machine learning model you'll ever build.
The foundation is worth building well.

Written as part of the LuxDevHQ Intermediate Python Certification Course.
Follow along as I continue documenting my Python learning journey

Top comments (0)