You have been learning individual tools for fourteen posts.
Variables. Loops. Functions. Classes. Files. Error handling. Modules. All separate. All practiced in isolation.
Now you stop practicing in isolation.
This post is a real project. Not a toy example. A complete data processing program that reads student records from a CSV file, cleans the data, calculates statistics, finds the top performers, flags who failed, and writes a summary report. Every piece of Python you learned gets used here.
This is what it feels like when the tools click together.
What We Are Building
A student grade processor. Here is exactly what it does:
Reads a CSV file with student names and exam scores. Cleans it by removing invalid entries, empty rows, scores that are not numbers. Calculates the class average, highest score, lowest score, and pass rate. Identifies top performers (above 85) and students who failed (below 40). Writes a clean summary report to a text file. Handles any file errors gracefully.
By the end you will have a program you could actually hand to a teacher and they could use it.
Step 1: Create the Data File
First create students.csv in your project folder. Copy this exactly:
name,score
Alex,92
Priya,78
Sam,
Jordan,61
Lisa,45
Ravi,88
null,55
Tom,102
Maya,73
Arjun,34
Zara,91
Ben,abc
Carlos,67
Nina,39
Oscar,85
This data is intentionally messy. Sam has no score. The name "null" is clearly not a real student. Tom's score of 102 is impossible (assume max is 100). Ben's score is "abc" which is not a number. Your program needs to handle all of this without crashing.
Step 2: Build the Loader
Create grade_processor.py.
Start with the function that reads the file.
import csv
import json
from datetime import datetime
def load_students(filename):
students = []
try:
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
students.append({
"name": row["name"],
"score": row["score"]
})
except FileNotFoundError:
print(f"Error: {filename} not found.")
return []
print(f"Loaded {len(students)} raw records.")
return students
csv.DictReader reads each row as a dictionary where column names are the keys. So each row becomes {"name": "Alex", "score": "92"}. Notice scores come in as strings. You will convert them in the next step.
Step 3: Clean the Data
Raw data is never trustworthy. Write a function that rejects bad records and explains why.
def clean_students(students):
clean = []
rejected = []
for student in students:
name = student["name"].strip()
score_raw = student["score"].strip()
if not name or name.lower() == "null":
rejected.append(f"Skipped: invalid name '{name}'")
continue
if not score_raw:
rejected.append(f"Skipped: {name} has no score")
continue
try:
score = float(score_raw)
except ValueError:
rejected.append(f"Skipped: {name} has non-numeric score '{score_raw}'")
continue
if score < 0 or score > 100:
rejected.append(f"Skipped: {name} has out-of-range score {score}")
continue
clean.append({"name": name, "score": score})
print(f"\nCleaning complete:")
print(f" Valid records: {len(clean)}")
print(f" Rejected: {len(rejected)}")
for reason in rejected:
print(f" {reason}")
return clean
Every rejection has a reason. You are not silently dropping bad data. You are tracking what got removed and why. In real data work, you always want to know what you lost.
Step 4: Analyze the Data
Now that the data is clean, calculate everything useful.
def analyze(students):
if not students:
print("No valid students to analyze.")
return None
scores = [s["score"] for s in students]
total = len(scores)
average = sum(scores) / total
highest = max(scores)
lowest = min(scores)
passed = [s for s in students if s["score"] >= 40]
failed = [s for s in students if s["score"] < 40]
top_performers = [s for s in students if s["score"] >= 85]
results = {
"total_students": total,
"average_score": round(average, 2),
"highest_score": highest,
"lowest_score": lowest,
"pass_count": len(passed),
"fail_count": len(failed),
"pass_rate": round((len(passed) / total) * 100, 1),
"top_performers": sorted(top_performers, key=lambda s: s["score"], reverse=True),
"failed_students": sorted(failed, key=lambda s: s["score"]),
"all_students": sorted(students, key=lambda s: s["score"], reverse=True)
}
return results
List comprehensions, lambda sorting, basic math, all from previous posts, all working together here doing real work.
Step 5: Display the Results
Print a clean, readable summary to the terminal.
def display_results(results):
if not results:
return
print("\n" + "=" * 45)
print(" STUDENT GRADE REPORT")
print("=" * 45)
print(f"\nTotal Students : {results['total_students']}")
print(f"Class Average : {results['average_score']}")
print(f"Highest Score : {results['highest_score']}")
print(f"Lowest Score : {results['lowest_score']}")
print(f"Pass Rate : {results['pass_rate']}%")
print(f"Passed : {results['pass_count']}")
print(f"Failed : {results['fail_count']}")
print("\n--- Top Performers (85+) ---")
if results["top_performers"]:
for s in results["top_performers"]:
print(f" {s['name']:<15} {s['score']}")
else:
print(" None")
print("\n--- Students Who Failed (<40) ---")
if results["failed_students"]:
for s in results["failed_students"]:
print(f" {s['name']:<15} {s['score']}")
else:
print(" None")
print("\n--- Full Rankings ---")
for i, s in enumerate(results["all_students"], 1):
status = "PASS" if s["score"] >= 40 else "FAIL"
print(f" {i}. {s['name']:<15} {s['score']:<8} {status}")
print("=" * 45)
{s['name']:<15} left-aligns the name in a 15-character field. Columns line up. Output looks professional, not like a pile of print statements.
Step 6: Save the Report
Write everything to a file so it can be shared.
def save_report(results, filename="report.txt"):
if not results:
return
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
try:
with open(filename, "w") as file:
file.write(f"Grade Report - Generated {timestamp}\n")
file.write("=" * 45 + "\n\n")
file.write(f"Total Students : {results['total_students']}\n")
file.write(f"Class Average : {results['average_score']}\n")
file.write(f"Highest Score : {results['highest_score']}\n")
file.write(f"Lowest Score : {results['lowest_score']}\n")
file.write(f"Pass Rate : {results['pass_rate']}%\n\n")
file.write("Top Performers:\n")
for s in results["top_performers"]:
file.write(f" {s['name']}: {s['score']}\n")
file.write("\nFull Rankings:\n")
for i, s in enumerate(results["all_students"], 1):
status = "PASS" if s["score"] >= 40 else "FAIL"
file.write(f" {i}. {s['name']}: {s['score']} - {status}\n")
print(f"\nReport saved to {filename}")
except IOError as e:
print(f"Could not save report: {e}")
Step 7: Wire It All Together
The main block that runs when you execute the file.
def main():
print("Starting Grade Processor...")
print("-" * 45)
raw_students = load_students("students.csv")
if not raw_students:
print("Nothing to process.")
return
clean = clean_students(raw_students)
if not clean:
print("No valid data after cleaning.")
return
results = analyze(clean)
display_results(results)
save_report(results)
print("\nDone.")
if __name__ == "__main__":
main()
if __name__ == "__main__": means: only run main() if this file is executed directly. If another file imports this file, main() will not run automatically. This is a standard Python pattern you will see everywhere.
Run It
python grade_processor.py
Output should look like:
Starting Grade Processor...
---------------------------------------------
Loaded 15 raw records.
Cleaning complete:
Valid records: 11
Rejected: 4
Skipped: Sam has no score
Skipped: invalid name 'null'
Skipped: Tom has out-of-range score 102.0
Skipped: Ben has non-numeric score 'abc'
=============================================
STUDENT GRADE REPORT
=============================================
Total Students : 11
Class Average : 68.36
Highest Score : 92.0
Lowest Score : 34.0
Pass Rate : 81.8%
Passed : 9
Failed : 2
--- Top Performers (85+) ---
Alex 92.0
Zara 91.0
Ravi 88.0
Oscar 85.0
--- Students Who Failed (<40) ---
Arjun 34.0
Nina 39.0
--- Full Rankings ---
1. Alex 92.0 PASS
2. Zara 91.0 PASS
3. Ravi 88.0 PASS
...
=============================================
Report saved to report.txt
Done.
Now Break It and Extend It
This is where real learning happens. Try all of these:
Delete students.csv and run it. Does it handle the missing file gracefully?
Add five more students directly to the CSV, some with valid scores, some without.
Change the pass mark from 40 to 50 and see who moves from pass to fail.
Add a new function called get_grade_letter that converts a numeric score to A, B, C, D, or F and include the letter grade in the report.
Add a function that groups students by grade letter and shows how many are in each group.
What Just Happened
You used every major concept from Phase 1 in one project.
Functions to organize logic. Classes were not needed here but the project structure mirrors how you would use them. File I/O to read the CSV and write the report. Error handling to catch bad files and bad data. List comprehensions to filter and transform. Lambda for sorting. Modules for csv, datetime, json. f-strings throughout.
Nothing in this project was new. It was all tools you already had. The project just showed you what they look like when they work together.
It starts with the most important question: what does math actually do in AI, and how much of it do you really need to know?
Top comments (0)