DEV Community

Cover image for Build a Data Processor: Your First Real Python Project
Akhilesh
Akhilesh

Posted on

Build a Data Processor: Your First Real Python Project

You have been learning individual tools for fourteen posts.

Variables. Loops. Functions. Classes. Files. Error handling. Modules. All separate. All practiced in isolation.

Now you stop practicing in isolation.

This post is a real project. Not a toy example. A complete data processing program that reads student records from a CSV file, cleans the data, calculates statistics, finds the top performers, flags who failed, and writes a summary report. Every piece of Python you learned gets used here.

This is what it feels like when the tools click together.


What We Are Building

A student grade processor. Here is exactly what it does:

Reads a CSV file with student names and exam scores. Cleans it by removing invalid entries, empty rows, scores that are not numbers. Calculates the class average, highest score, lowest score, and pass rate. Identifies top performers (above 85) and students who failed (below 40). Writes a clean summary report to a text file. Handles any file errors gracefully.

By the end you will have a program you could actually hand to a teacher and they could use it.


Step 1: Create the Data File

First create students.csv in your project folder. Copy this exactly:

name,score
Alex,92
Priya,78
Sam,
Jordan,61
Lisa,45
Ravi,88
null,55
Tom,102
Maya,73
Arjun,34
Zara,91
Ben,abc
Carlos,67
Nina,39
Oscar,85
Enter fullscreen mode Exit fullscreen mode

This data is intentionally messy. Sam has no score. The name "null" is clearly not a real student. Tom's score of 102 is impossible (assume max is 100). Ben's score is "abc" which is not a number. Your program needs to handle all of this without crashing.


Step 2: Build the Loader

Create grade_processor.py.

Start with the function that reads the file.

import csv
import json
from datetime import datetime


def load_students(filename):
    students = []

    try:
        with open(filename, "r") as file:
            reader = csv.DictReader(file)
            for row in reader:
                students.append({
                    "name": row["name"],
                    "score": row["score"]
                })
    except FileNotFoundError:
        print(f"Error: {filename} not found.")
        return []

    print(f"Loaded {len(students)} raw records.")
    return students
Enter fullscreen mode Exit fullscreen mode

csv.DictReader reads each row as a dictionary where column names are the keys. So each row becomes {"name": "Alex", "score": "92"}. Notice scores come in as strings. You will convert them in the next step.


Step 3: Clean the Data

Raw data is never trustworthy. Write a function that rejects bad records and explains why.

def clean_students(students):
    clean = []
    rejected = []

    for student in students:
        name = student["name"].strip()
        score_raw = student["score"].strip()

        if not name or name.lower() == "null":
            rejected.append(f"Skipped: invalid name '{name}'")
            continue

        if not score_raw:
            rejected.append(f"Skipped: {name} has no score")
            continue

        try:
            score = float(score_raw)
        except ValueError:
            rejected.append(f"Skipped: {name} has non-numeric score '{score_raw}'")
            continue

        if score < 0 or score > 100:
            rejected.append(f"Skipped: {name} has out-of-range score {score}")
            continue

        clean.append({"name": name, "score": score})

    print(f"\nCleaning complete:")
    print(f"  Valid records: {len(clean)}")
    print(f"  Rejected: {len(rejected)}")
    for reason in rejected:
        print(f"  {reason}")

    return clean
Enter fullscreen mode Exit fullscreen mode

Every rejection has a reason. You are not silently dropping bad data. You are tracking what got removed and why. In real data work, you always want to know what you lost.


Step 4: Analyze the Data

Now that the data is clean, calculate everything useful.

def analyze(students):
    if not students:
        print("No valid students to analyze.")
        return None

    scores = [s["score"] for s in students]

    total = len(scores)
    average = sum(scores) / total
    highest = max(scores)
    lowest = min(scores)
    passed = [s for s in students if s["score"] >= 40]
    failed = [s for s in students if s["score"] < 40]
    top_performers = [s for s in students if s["score"] >= 85]

    results = {
        "total_students": total,
        "average_score": round(average, 2),
        "highest_score": highest,
        "lowest_score": lowest,
        "pass_count": len(passed),
        "fail_count": len(failed),
        "pass_rate": round((len(passed) / total) * 100, 1),
        "top_performers": sorted(top_performers, key=lambda s: s["score"], reverse=True),
        "failed_students": sorted(failed, key=lambda s: s["score"]),
        "all_students": sorted(students, key=lambda s: s["score"], reverse=True)
    }

    return results
Enter fullscreen mode Exit fullscreen mode

List comprehensions, lambda sorting, basic math, all from previous posts, all working together here doing real work.


Step 5: Display the Results

Print a clean, readable summary to the terminal.

def display_results(results):
    if not results:
        return

    print("\n" + "=" * 45)
    print("        STUDENT GRADE REPORT")
    print("=" * 45)

    print(f"\nTotal Students : {results['total_students']}")
    print(f"Class Average  : {results['average_score']}")
    print(f"Highest Score  : {results['highest_score']}")
    print(f"Lowest Score   : {results['lowest_score']}")
    print(f"Pass Rate      : {results['pass_rate']}%")
    print(f"Passed         : {results['pass_count']}")
    print(f"Failed         : {results['fail_count']}")

    print("\n--- Top Performers (85+) ---")
    if results["top_performers"]:
        for s in results["top_performers"]:
            print(f"  {s['name']:<15} {s['score']}")
    else:
        print("  None")

    print("\n--- Students Who Failed (<40) ---")
    if results["failed_students"]:
        for s in results["failed_students"]:
            print(f"  {s['name']:<15} {s['score']}")
    else:
        print("  None")

    print("\n--- Full Rankings ---")
    for i, s in enumerate(results["all_students"], 1):
        status = "PASS" if s["score"] >= 40 else "FAIL"
        print(f"  {i}. {s['name']:<15} {s['score']:<8} {status}")

    print("=" * 45)
Enter fullscreen mode Exit fullscreen mode

{s['name']:<15} left-aligns the name in a 15-character field. Columns line up. Output looks professional, not like a pile of print statements.


Step 6: Save the Report

Write everything to a file so it can be shared.

def save_report(results, filename="report.txt"):
    if not results:
        return

    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

    try:
        with open(filename, "w") as file:
            file.write(f"Grade Report - Generated {timestamp}\n")
            file.write("=" * 45 + "\n\n")
            file.write(f"Total Students : {results['total_students']}\n")
            file.write(f"Class Average  : {results['average_score']}\n")
            file.write(f"Highest Score  : {results['highest_score']}\n")
            file.write(f"Lowest Score   : {results['lowest_score']}\n")
            file.write(f"Pass Rate      : {results['pass_rate']}%\n\n")

            file.write("Top Performers:\n")
            for s in results["top_performers"]:
                file.write(f"  {s['name']}: {s['score']}\n")

            file.write("\nFull Rankings:\n")
            for i, s in enumerate(results["all_students"], 1):
                status = "PASS" if s["score"] >= 40 else "FAIL"
                file.write(f"  {i}. {s['name']}: {s['score']} - {status}\n")

        print(f"\nReport saved to {filename}")

    except IOError as e:
        print(f"Could not save report: {e}")
Enter fullscreen mode Exit fullscreen mode

Step 7: Wire It All Together

The main block that runs when you execute the file.

def main():
    print("Starting Grade Processor...")
    print("-" * 45)

    raw_students = load_students("students.csv")

    if not raw_students:
        print("Nothing to process.")
        return

    clean = clean_students(raw_students)

    if not clean:
        print("No valid data after cleaning.")
        return

    results = analyze(clean)
    display_results(results)
    save_report(results)

    print("\nDone.")


if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

if __name__ == "__main__": means: only run main() if this file is executed directly. If another file imports this file, main() will not run automatically. This is a standard Python pattern you will see everywhere.


Run It

python grade_processor.py
Enter fullscreen mode Exit fullscreen mode

Output should look like:

Starting Grade Processor...
---------------------------------------------
Loaded 15 raw records.

Cleaning complete:
  Valid records: 11
  Rejected: 4
  Skipped: Sam has no score
  Skipped: invalid name 'null'
  Skipped: Tom has out-of-range score 102.0
  Skipped: Ben has non-numeric score 'abc'

=============================================
        STUDENT GRADE REPORT
=============================================

Total Students : 11
Class Average  : 68.36
Highest Score  : 92.0
Lowest Score   : 34.0
Pass Rate      : 81.8%
Passed         : 9
Failed         : 2

--- Top Performers (85+) ---
  Alex            92.0
  Zara            91.0
  Ravi            88.0
  Oscar           85.0

--- Students Who Failed (<40) ---
  Arjun           34.0
  Nina            39.0

--- Full Rankings ---
  1. Alex            92.0     PASS
  2. Zara            91.0     PASS
  3. Ravi            88.0     PASS
  ...
=============================================

Report saved to report.txt

Done.
Enter fullscreen mode Exit fullscreen mode

Now Break It and Extend It

This is where real learning happens. Try all of these:

Delete students.csv and run it. Does it handle the missing file gracefully?

Add five more students directly to the CSV, some with valid scores, some without.

Change the pass mark from 40 to 50 and see who moves from pass to fail.

Add a new function called get_grade_letter that converts a numeric score to A, B, C, D, or F and include the letter grade in the report.

Add a function that groups students by grade letter and shows how many are in each group.


What Just Happened

You used every major concept from Phase 1 in one project.

Functions to organize logic. Classes were not needed here but the project structure mirrors how you would use them. File I/O to read the CSV and write the report. Error handling to catch bad files and bad data. List comprehensions to filter and transform. Lambda for sorting. Modules for csv, datetime, json. f-strings throughout.

Nothing in this project was new. It was all tools you already had. The project just showed you what they look like when they work together.
It starts with the most important question: what does math actually do in AI, and how much of it do you really need to know?

Top comments (0)