Backend Quality Gates
See also: Frontend Quality Gates
We picked the boring stack. Python. FastAPI. The technologies AI understands. Now we make sure AI doesn't write garbage.
Code reviews are a beautiful fantasy we tell ourselves. "Someone will catch my mistakes." No they won't. They're checking Slack. They're thinking about lunch. They're wondering if that meeting could've been an email.
Meanwhile, your time.sleep() sits there in async code, waiting to murder production at 3 AM on a Saturday. Nobody will catch it. Nobody ever does.
So I stopped pretending humans review code. I let robots do it. Robots don't get hungry. Robots don't have feelings. Robots are perfect for this job.
And here's the kicker: none of this is even testing. Not a single test runs. This is just linting. Glorified spell-check for code. We haven't even started verifying that the code does what it's supposed to do. We're just making sure it's not obviously broken before we bother checking if it works.
The bar is on the floor. And most codebases still trip over it.
Ruff: Because Life Is Too Short for Slow Linters
Remember pylint? You'd run it, go make coffee, come back, still running. So everyone disabled it. Problem solved. Also: problems not solved at all.
Ruff is written in Rust. It runs in milliseconds. You can't even alt-tab fast enough to avoid it.
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes (undefined names, unused imports)
"I", # isort (import ordering)
"B", # flake8-bugbear (common bugs)
"C4", # flake8-comprehensions
"UP", # pyupgrade (modern syntax)
"ASYNC", # flake8-async (async safety)
]
That ASYNC rule at the bottom? That's the one that saves your weekends.
The Async Footgun
Here's how to tank your server in one line:
async def fetch_data():
time.sleep(1) # Looks innocent. Isn't.
return await get_data()
This blocks the entire event loop. Every user. Every request. Everything stops while your code takes a little nap. No error. No warning. Just... silence. And then your phone rings at 3 AM.
"The site is slow."
No kidding. You put time.sleep() in async code.
Ruff catches this:
ASYNC100: blocking call `time.sleep` in async function
The fix takes two seconds:
async def fetch_data():
await asyncio.sleep(1) # Look ma, no blocking
return await get_data()
I make this mistake weekly. Sometimes daily. My brain refuses to learn. Fortunately, Ruff doesn't care about my brain. Ruff just yells. That's the relationship.
MyPy: Because "It Works" Is Not a Type
Python is dynamically typed. This means you can write this:
def process(data):
return data.upper()
What is data? Could be a string. Could be a list. Could be your hopes and dreams. Python doesn't care. Python will try to call .upper() on anything. Python believes in you.
Python is wrong to believe in you.
MyPy strict mode fixes this by being incredibly annoying:
[tool.mypy]
strict = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
warn_return_any = true
Now you have to actually say what things are:
def process(data: str) -> str:
return data.upper()
Is this more typing? Yes. Is this tedious? Also yes. Will this save you from a 4-hour debugging session because you passed a dict to a function expecting a string? Absolutely yes.
The AI also loves types. Give it typed code and it knows exactly what to generate. Give it untyped code and it hallucinates confidently. Your choice.
The Runtime Safety Net
Linters catch a lot. But not everything. Third-party libraries do weird things. Someone's "async" wrapper is actually sync. Life is full of disappointments.
So I run tests with the event loop in paranoid mode:
@pytest.fixture
def event_loop():
loop = asyncio.new_event_loop()
loop.set_debug(True)
loop.slow_callback_duration = 0.1 # 100ms = you're blocking
yield loop
Anything takes longer than 100ms? Test fails. Loudly. Rudely. Exactly as it should.
Belt and suspenders. Because I've seen things. Things that work perfectly locally and explode in production. Things that pass every test and still somehow break. Plan accordingly.
The Gate
One job per check. When something fails, you know exactly what.
.backend-quality:
stage: quality
image: python:3.12-slim
variables:
UV_CACHE_DIR: .uv-cache
cache:
key: uv-backend-${CI_COMMIT_REF_SLUG}
paths:
- .uv-cache
before_script:
- pip install uv
- uv sync --frozen --no-install-project
allow_failure: false
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
ruff:check:
extends: .backend-quality
script:
- uv run ruff check .
ruff:format:
extends: .backend-quality
script:
- uv run ruff format --check .
mypy:
extends: .backend-quality
script:
- uv run mypy .
Three jobs. Same stage. Run in parallel. When one fails, you see exactly which one in the pipeline view. No scrolling through logs to find the error.
The hidden job — .backend-quality starts with a dot. GitLab won't run it directly. It's a template. DRY without the copy-paste.
extends — Each job inherits the template. Same image, same cache, same rules. Only the script changes.
Parallel execution — All three jobs run at the same time. Faster feedback. If ruff and mypy both fail, you see both failures immediately. Fix them together instead of playing whack-a-mole.
allow_failure: false — On every job. This isn't a suggestion. Your MR sits there, rejected, until all three pass.
The pipeline doesn't care that it's Friday at 5 PM. The pipeline doesn't care that "it works on my machine." The pipeline is the most reliable colleague you'll ever have.
Here's the full picture:
stages:
- quality
.backend-quality:
stage: quality
image: python:3.12-slim
variables:
UV_CACHE_DIR: .uv-cache
cache:
key: uv-backend-${CI_COMMIT_REF_SLUG}
paths:
- .uv-cache
before_script:
- pip install uv
- uv sync --frozen --no-install-project
allow_failure: false
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
ruff:check:
extends: .backend-quality
script:
- uv run ruff check .
ruff:format:
extends: .backend-quality
script:
- uv run ruff format --check .
mypy:
extends: .backend-quality
script:
- uv run mypy .
Copy, paste, adapt. It works.
The Point
I could review code carefully. I could remember all the async gotchas. I could check every type hint manually.
I could also juggle chainsaws. Both are technically possible. Neither is a good idea.
The reality is: humans forget things. That's not a character flaw—that's human nature. The trick isn't to fight it. The trick is to build systems that work despite it.
I write the rules once. The machines enforce them forever. They never get tired. They never get distracted. They never think "eh, it's probably fine."
The linter catches what I forget. The type checker verifies what I assume. The pipeline blocks what I'd regret.
That's the deal.
Next up: TypeScript or Tears — Same idea, different battlefield. JavaScript lies. TypeScript doesn't.
Top comments (0)