DEV Community

Cover image for Building a Healthcare RCM Analytics API with FastAPI and PostgreSQL
luis8choa
luis8choa

Posted on

Building a Healthcare RCM Analytics API with FastAPI and PostgreSQL

Introduction

I'm a biomedical engineer who ended up doing data analysis for a Revenue Cycle Management company in the U.S. — and honestly, I wouldn't change it. There's something fascinating about the intersection of healthcare operations and data.

But after two years on the floor, one thing became painfully obvious: the people actually resolving claims — billers, coders, AR agents — are flying blind. We're working with spreadsheets, static reports that are already two days old, and gut feeling. When you're processing 60,000+ claims a month, gut feeling doesn't scale.

The bottlenecks are always the same: which payer is denying the most claims right now? Which claims have been sitting unpaid for 90+ days and are probably never going to be resolved? Who on the team has a 40% error rate that nobody has flagged yet?

This information exists. It's sitting in the database. Nobody built the tool to surface it.

So I did. This is the story of building a REST API that gives RCM teams real-time access to the KPIs that actually matter — denial rates, AR days, aging buckets, and staff productivity — using FastAPI and PostgreSQL.


Section 1: What is Revenue Cycle Management?

If you're not in healthcare, RCM might sound abstract. Let me make it concrete.

Every time a patient sees a doctor, the provider needs to get paid. That payment doesn't come directly from the patient in most cases — it comes from an insurance company (called a payer). The process of getting that money from the payer is Revenue Cycle Management.

The claim cycle looks like this:

Medical service rendered
        ↓
Medical coding (CPT codes assigned to each procedure)
        ↓
Claim submitted to payer
        ↓
Payer reviews the claim
        ↓
Paid / Denied / Pending / Appealed
Enter fullscreen mode Exit fullscreen mode

Two numbers tell you almost everything about the health of this cycle:

Denial Rate — the percentage of claims a payer rejects. A high denial rate with a specific payer usually signals a billing accuracy problem, a credentialing issue, or a payer policy change nobody caught. If BlueCross is denying 35% of your claims and Aetna is only denying 8%, that's where your team should be focused.

AR Days — how many days on average it takes from claim submission to payment. The industry benchmark is 40 days. Above that, cash flow starts to suffer. Above 90 days, you're probably not getting paid at all.

The problem isn't that this data doesn't exist — it does. The problem is that when you're managing 60,000+ claims, nobody has a clean, real-time view of it. Directors and managers are making staffing and process decisions based on reports that are already outdated.

That's the gap this API fills.


Section 2: Architecture Decisions

When I started thinking about how to build this, I had three requirements: it had to be fast to query, easy for other developers to understand and extend, and it had to separate the business logic cleanly from the HTTP layer.

Why FastAPI over Flask or Django

FastAPI generates interactive API documentation automatically from your code. For a tool that other analysts or developers might consume, that matters enormously — the /docs endpoint gives you a working Swagger UI with zero extra configuration. Flask would have required a separate library for that. Django would have been overkill for an API-only project.

FastAPI also validates request and response data automatically using Python type hints, which catches entire categories of bugs before they reach production.

The Service Layer Pattern

The most important architectural decision was separating business logic into a services/ layer completely independent from the HTTP routing layer.

routers/claims.py             receives HTTP request, validates parameters
services/claims_service.py    contains the actual calculation logic
Enter fullscreen mode Exit fullscreen mode

This means two things in practice. First, the alert system can call get_denial_rate() directly from claims_service without making an internal HTTP request — it just calls the Python function. Second, testing the denial rate calculation doesn't require simulating an HTTP request — you call the function directly with a database session.

SQLAlchemy 2.0 with Pydantic v2

SQLAlchemy handles the relationship between Python objects and PostgreSQL tables. Pydantic handles the relationship between Python objects and JSON. They solve different problems and that's why you need both.

A SQLAlchemy model says: "this is how a claim is stored in the database — all columns, all relationships, including internal fields like hashed_password."

A Pydantic schema says: "this is what a claim looks like when it enters or leaves the API — only the fields the client should see."

The separation is what prevents accidentally exposing hashed_password in a response.


Section 3: The Denial Rate Calculation

This is the query I'm most proud of in the project — not because it's complex, but because it took me a while to realize it could be done in a single database round trip.

My first instinct was two separate queries:

# Query 1 — total claims per payer
total = db.query(Claim.payer, func.count(Claim.id)).group_by(Claim.payer).all()

# Query 2 — denied claims per payer
denied = db.query(Claim.payer, func.count(Claim.id))\
           .filter(Claim.status == "denied")\
           .group_by(Claim.payer).all()
Enter fullscreen mode Exit fullscreen mode

Two trips to the database. Two result sets to merge in Python. More code, more room for error.

The better approach uses a conditional SUM with CASE WHEN — a single query that counts everything at once:

results = db.query(
    Claim.payer,
    func.count(Claim.id).label("total"),
    func.sum(
        case((Claim.status == "denied", 1), else_=0)
    ).label("denied"),
).group_by(Claim.payer).all()
Enter fullscreen mode Exit fullscreen mode

The SQL this generates:

SELECT payer,
       COUNT(id) AS total,
       SUM(CASE WHEN status = 'denied' THEN 1 ELSE 0 END) AS denied
FROM claims
GROUP BY payer
Enter fullscreen mode Exit fullscreen mode

The CASE WHEN assigns 1 to every denied claim and 0 to everything else. SUM adds those up. One query, one database round trip, all the data you need.

The denial rate is then calculated in Python:

"rate": round((row.denied or 0) / row.total * 100, 2)
Enter fullscreen mode Exit fullscreen mode

The or 0 handles the edge case where SUM returns NULL for payers with zero denials — in SQL, summing an empty set returns NULL, not 0.


Section 4: Testing Strategy

Testing an API that talks to a database introduces a fundamental problem: you can't run tests against your production database, but you also can't test database logic without a database.

The solution has two parts.

SQLite for local tests

Instead of requiring a running PostgreSQL server to run tests, the test suite uses SQLite — a file-based database that needs no server. Each test creates all tables, runs, and drops everything:

@pytest.fixture(autouse=True)
def setup_db():
    Base.metadata.create_all(bind=engine_test)  # create tables
    yield                                         # run the test
    Base.metadata.drop_all(bind=engine_test)     # drop everything
Enter fullscreen mode Exit fullscreen mode

Every test starts with a completely clean database. No data contamination between tests.

FastAPI's dependency_overrides

The bridge between "my app talks to PostgreSQL" and "my tests talk to SQLite" is FastAPI's dependency override system:

app.dependency_overrides[get_db] = override_get_db
Enter fullscreen mode Exit fullscreen mode

This single line tells FastAPI: "whenever any endpoint asks for get_db, give it override_get_db instead." Every endpoint in the application — without modifying a single line of production code — now talks to SQLite during tests.

The 80% coverage threshold

The CI pipeline fails if test coverage drops below 80%. This isn't about achieving a number — it's about making coverage degradation visible. If someone adds a new service function and forgets to write tests for it, the pipeline catches it before the code reaches main.

The current coverage is 93%.


Conclusion

Two years working in RCM gave me a very specific frustration. Ten weeks of building gave me a very specific solution.

The most valuable part of this project wasn't any particular technical decision — it was having to translate operational problems into database queries and vice versa. That translation is hard to fake in an interview, and hard to build without having been on both sides of it.

The project is open source:
🔗 github.com/luis8choa/RCM-Analytics-API

What's next: a frontend dashboard to visualize the KPIs, and a production deploy so billing teams can actually use it.

If you work in healthcare operations and this resonates — or if you have feedback on the technical side — I'd genuinely like to hear from you.

Top comments (0)