Why I finally stopped writing God classes in Python (and what I learned)

#cleancode #softwaredevelopment #programming #bestpractices

Why I finally stopped writing God classes in Python (and what I learned)

Quick context (why you're writing this)

Honestly, I still cringe when I think about the first big feature I shipped at my last job. We had a ReportGenerator class that pulled data from an API, cleaned it, ran a bunch of business rules, rendered a PDF, and then shoved the file into an S3 bucket. It was ~400 lines long, had more private helper methods than I could keep track of, and every time someone asked for a tiny tweak—like changing the date format—I ended up digging through a maze of if statements and side‑effects. I spent three hours one Friday just trying to figure out why a unit test was failing, only to discover that a change in the PDF layout had silently broken the validation step because they shared the same internal state. That’s when I realized the class wasn’t just “big”; it was doing too many things at once, and the cost was showing up in bugs, slow onboarding, and a dread of touching the code.

The Insight

The single idea that changed how I write code is the Single Responsibility Principle (SRP): a class—or even a function—should have one reason to change. When you obey SRP, you end up with pieces that are easier to test, easier to reuse, and far less likely to break when you tweak something unrelated. It sounds obvious, but in practice it’s easy to slip into the “just add one more method here” trap because it feels faster in the moment. The trade‑off is a little more upfront scaffolding, but the payoff in maintainability is massive.

How (with code)

The before: a classic God class

class ReportGenerator:
    def __init__(self, api_client, s3_bucket):
        self.api = api_client
        self.bucket = s3_bucket
        self._raw_data = None
        self._cleaned = None
        self._pdf_bytes = None

    def fetch(self):
        # Talk to external service
        self._raw_data = self.api.get("/metrics")
        return self

    def validate(self):
        # Business rules that belong nowhere else
        if not self._raw_data:
            raise ValueError("No data to validate")
        for row in self._raw_data:
            if row["value"] < 0:
                raise ValueError("Negative metric found")
        return self

    def transform(self):
        # Data cleaning + formatting
        self._cleaned = [
            {
                "date": row["timestamp"].strftime("%Y-%m-%d"),
                "value": round(float(row["value"]), 2),
            }
            for row in self._raw_data
        ]
        return self

    def render_pdf(self):
        # PDF generation (heavy library, side‑effects)
        from reportlab.lib.pagesizes import letter
        from reportlab.pdfgen import canvas

        buffer = BytesIO()
        c = canvas.Canvas(buffer, pagesize=letter)
        y = 750
        for item in self._cleaned:
            c.drawString(50, y, f"{item['date']}: {item['value']}")
            y -= 20
        c.showPage()
        c.save()
        self._pdf_bytes = buffer.getvalue()
        return self

    def upload(self):
        # Side‑effect: push to S3
        if not self._pdf_bytes:
            raise RuntimeError("PDF not rendered")
        self.bucket.put_object(
            Key="report.pdf", Body=self._pdf_bytes, ContentType="application/pdf"
        )

What’s wrong here?

The class knows how to get data, what makes it valid, how to format it, how to draw a PDF, and where to store it.
Change the validation rule? You have to touch the same class that also knows about PDF fonts.
Unit testing validate means you also have to mock the API call because fetch sets _raw_data.
If the PDF library upgrades and breaks something, you risk breaking validation because they share internal state.

The after: splitting responsibilities

# 1️⃣ Data access – only knows how to get raw data
class MetricsFetcher:
    def __init__(self, api_client):
        self.api = api_client

    def fetch(self):
        return self.api.get("/metrics")


# 2️⃣ Validation – pure function, no side effects
def validate_metrics(raw_data):
    if not raw_data:
        raise ValueError("No data to validate")
    for row in raw_data:
        if row["value"] < 0:
            raise ValueError("Negative metric found")
    return True


# 3️⃣ Transformation – turns raw dicts into clean dicts
def clean_metrics(raw_data):
    return [
        {
            "date": row["timestamp"].strftime("%Y-%m-%d"),
            "value": round(float(row["value"]), 2),
        }
        for row in raw_data
    ]


# 4️⃣ Rendering – knows only about PDF creation
class PDFRenderer:
    def render(self, cleaned_data):
        from reportlab.lib.pagesizes import letter
        from reportlab.pdfgen import canvas
        from io import BytesIO

        buffer = BytesIO()
        c = canvas.Canvas(buffer, pagesize=letter)
        y = 750
        for item in cleaned_data:
            c.drawString(50, y, f"{item['date']}: {item['value']}")
            y -= 20
        c.showPage()
        c.save()
        return buffer.getvalue()


# 5️⃣ Upload – knows only about S3
class S3Uploader:
    def __init__(self, bucket):
        self.bucket = bucket

    def upload(self, pdf_bytes, key="report.pdf"):
        self.bucket.put_object(
            Key=key, Body=pdf_bytes, ContentType="application/pdf"
        )

Now the workflow looks like this:

fetcher = MetricsFetcher(api_client)
raw = fetcher.fetch()

validate_metrics(raw)          # raises if bad
cleaned = clean_metrics(raw)

renderer = PDFRenderer()
pdf_bytes = renderer.render(cleaned)

uploader = S3Uploader(s3_bucket)
uploader.upload(pdf_bytes)

Each piece has a single reason to change:

If the API endpoint changes, only MetricsFetcher touches the code.
If a new validation rule appears, you edit validate_metrics (or add another validator) without touching PDF generation.
If you need to swap ReportLab for WeasyPrint, you only modify PDFRenderer.
Testing becomes trivial: you can feed a dict straight into validate_metrics or clean_metrics and assert the output, no mocks required.

Why This Matters

The real cost of ignoring SRP isn’t just “a bit messy code.” It shows up as:

Debugging hell – a change in one area ripples through unrelated logic, making root‑cause hunting a guessing game.
Onboarding friction – new teammates have to hold the whole class in their head just to add a tiny feature.
Test brittleness – you end up writing overly complex mocks or skipping tests altogether because the setup is painful.
Deployment risk – a tiny tweak to the PDF layout could accidentally break validation because they share state, leading to bugs that only appear in production.

When I started breaking those responsibilities apart, the number of regressions dropped noticeably. Code reviews became faster because each pull request touched a clearly bounded piece. And honestly, it felt good to open a file and see a class that did one thing well, rather than a beast that tried to do everything.

Challenge

Take a look at your current codebase. Find a class or function that handles more than two distinct concerns (think: data fetching + validation + formatting + persistence). Try extracting one of those concerns into its own helper or class. Notice how the rest of the code becomes easier to reason about and test. What did you discover about the hidden couplings you were carrying around? Share your findings in the comments—I’m curious to see where you spot the next SRP win.