Why I finally stopped writing God classes in Python (and what I learned)
Quick context (why you're writing this)
Honestly, I still cringe when I think about the first big feature I shipped at my last job. We had a ReportGenerator class that pulled data from an API, cleaned it, ran a bunch of business rules, rendered a PDF, and then shoved the file into an S3 bucket. It was ~400 lines long, had more private helper methods than I could keep track of, and every time someone asked for a tiny tweak—like changing the date format—I ended up digging through a maze of if statements and side‑effects. I spent three hours one Friday just trying to figure out why a unit test was failing, only to discover that a change in the PDF layout had silently broken the validation step because they shared the same internal state. That’s when I realized the class wasn’t just “big”; it was doing too many things at once, and the cost was showing up in bugs, slow onboarding, and a dread of touching the code.
The Insight
The single idea that changed how I write code is the Single Responsibility Principle (SRP): a class—or even a function—should have one reason to change. When you obey SRP, you end up with pieces that are easier to test, easier to reuse, and far less likely to break when you tweak something unrelated. It sounds obvious, but in practice it’s easy to slip into the “just add one more method here” trap because it feels faster in the moment. The trade‑off is a little more upfront scaffolding, but the payoff in maintainability is massive.
How (with code)
The before: a classic God class
class ReportGenerator:
def __init__(self, api_client, s3_bucket):
self.api = api_client
self.bucket = s3_bucket
self._raw_data = None
self._cleaned = None
self._pdf_bytes = None
def fetch(self):
# Talk to external service
self._raw_data = self.api.get("/metrics")
return self
def validate(self):
# Business rules that belong nowhere else
if not self._raw_data:
raise ValueError("No data to validate")
for row in self._raw_data:
if row["value"] < 0:
raise ValueError("Negative metric found")
return self
def transform(self):
# Data cleaning + formatting
self._cleaned = [
{
"date": row["timestamp"].strftime("%Y-%m-%d"),
"value": round(float(row["value"]), 2),
}
for row in self._raw_data
]
return self
def render_pdf(self):
# PDF generation (heavy library, side‑effects)
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
buffer = BytesIO()
c = canvas.Canvas(buffer, pagesize=letter)
y = 750
for item in self._cleaned:
c.drawString(50, y, f"{item['date']}: {item['value']}")
y -= 20
c.showPage()
c.save()
self._pdf_bytes = buffer.getvalue()
return self
def upload(self):
# Side‑effect: push to S3
if not self._pdf_bytes:
raise RuntimeError("PDF not rendered")
self.bucket.put_object(
Key="report.pdf", Body=self._pdf_bytes, ContentType="application/pdf"
)
What’s wrong here?
- The class knows how to get data, what makes it valid, how to format it, how to draw a PDF, and where to store it.
- Change the validation rule? You have to touch the same class that also knows about PDF fonts.
- Unit testing
validatemeans you also have to mock the API call becausefetchsets_raw_data. - If the PDF library upgrades and breaks something, you risk breaking validation because they share internal state.
The after: splitting responsibilities
# 1️⃣ Data access – only knows how to get raw data
class MetricsFetcher:
def __init__(self, api_client):
self.api = api_client
def fetch(self):
return self.api.get("/metrics")
# 2️⃣ Validation – pure function, no side effects
def validate_metrics(raw_data):
if not raw_data:
raise ValueError("No data to validate")
for row in raw_data:
if row["value"] < 0:
raise ValueError("Negative metric found")
return True
# 3️⃣ Transformation – turns raw dicts into clean dicts
def clean_metrics(raw_data):
return [
{
"date": row["timestamp"].strftime("%Y-%m-%d"),
"value": round(float(row["value"]), 2),
}
for row in raw_data
]
# 4️⃣ Rendering – knows only about PDF creation
class PDFRenderer:
def render(self, cleaned_data):
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from io import BytesIO
buffer = BytesIO()
c = canvas.Canvas(buffer, pagesize=letter)
y = 750
for item in cleaned_data:
c.drawString(50, y, f"{item['date']}: {item['value']}")
y -= 20
c.showPage()
c.save()
return buffer.getvalue()
# 5️⃣ Upload – knows only about S3
class S3Uploader:
def __init__(self, bucket):
self.bucket = bucket
def upload(self, pdf_bytes, key="report.pdf"):
self.bucket.put_object(
Key=key, Body=pdf_bytes, ContentType="application/pdf"
)
Now the workflow looks like this:
fetcher = MetricsFetcher(api_client)
raw = fetcher.fetch()
validate_metrics(raw) # raises if bad
cleaned = clean_metrics(raw)
renderer = PDFRenderer()
pdf_bytes = renderer.render(cleaned)
uploader = S3Uploader(s3_bucket)
uploader.upload(pdf_bytes)
Each piece has a single reason to change:
- If the API endpoint changes, only
MetricsFetchertouches the code. - If a new validation rule appears, you edit
validate_metrics(or add another validator) without touching PDF generation. - If you need to swap ReportLab for WeasyPrint, you only modify
PDFRenderer. - Testing becomes trivial: you can feed a dict straight into
validate_metricsorclean_metricsand assert the output, no mocks required.
Why This Matters
The real cost of ignoring SRP isn’t just “a bit messy code.” It shows up as:
- Debugging hell – a change in one area ripples through unrelated logic, making root‑cause hunting a guessing game.
- Onboarding friction – new teammates have to hold the whole class in their head just to add a tiny feature.
- Test brittleness – you end up writing overly complex mocks or skipping tests altogether because the setup is painful.
- Deployment risk – a tiny tweak to the PDF layout could accidentally break validation because they share state, leading to bugs that only appear in production.
When I started breaking those responsibilities apart, the number of regressions dropped noticeably. Code reviews became faster because each pull request touched a clearly bounded piece. And honestly, it felt good to open a file and see a class that did one thing well, rather than a beast that tried to do everything.
Challenge
Take a look at your current codebase. Find a class or function that handles more than two distinct concerns (think: data fetching + validation + formatting + persistence). Try extracting one of those concerns into its own helper or class. Notice how the rest of the code becomes easier to reason about and test. What did you discover about the hidden couplings you were carrying around? Share your findings in the comments—I’m curious to see where you spot the next SRP win.
Top comments (0)