DEV Community

Cover image for Building a FastAPI + Vue 3 research platform: the 4 bugs that almost broke Phase 1
Oscar Rieken
Oscar Rieken

Posted on

Building a FastAPI + Vue 3 research platform: the 4 bugs that almost broke Phase 1

Phase 1 of NumPath is done. Seven of eight Definition of Done items are checked — the eighth requires real children completing pilot sessions, which no amount of code will substitute for. The stack runs cleanly in Docker Compose, 56 unit tests pass, and a student can log in, answer ten problems, and see their knowledge state update in real time.

What the commit history doesn't show is the afternoon I spent fighting four bugs that don't appear in any FastAPI or Vue tutorial. This post is that afternoon.

What We Built

NumPath is an adaptive math tutor for children with dyscalculia. Phase 1 ships the minimum research instrument: a student practice loop, a rule-based adaptive engine, and a read-only teacher dashboard. No ML yet — just clean infrastructure and a data collection pipeline capable of generating the 150+ attempt records that Phase 2 needs to train the BKT model.

The stack: FastAPI 0.110 + SQLAlchemy 2 + Alembic + asyncpg on the backend; Vue 3 + Tailwind + Pinia on the frontend; PostgreSQL 16 + Redis 7 in Docker Compose.

Bug 1: passlib AttributeError on bcrypt ≥4.0

The symptom was immediate on first login attempt:

AttributeError: module 'bcrypt' has no attribute '__about__'
Enter fullscreen mode Exit fullscreen mode

passlib has a version check that reads bcrypt.__about__.__version__. bcrypt 4.0 removed the __about__ module. The libraries have been incompatible for two years and passlib is effectively unmaintained.

The fix: delete passlib entirely. Replace it with three lines of direct bcrypt calls:

# backend/auth/password.py
import bcrypt

def hash_password(plain: str) -> str:
    return bcrypt.hashpw(plain.encode(), bcrypt.gensalt()).decode()

def verify_password(plain: str, hashed: str) -> bool:
    return bcrypt.checkpw(plain.encode(), hashed.encode())
Enter fullscreen mode Exit fullscreen mode

pyproject.toml: swap "passlib[bcrypt]>=1.7.4" for "bcrypt>=4.0.0". Done. Don't reach for passlib on new Python projects — the dependency is dead.

Bug 2: pnpm 10 security policies blocking Docker builds

The frontend Dockerfile used node:20-slim and installed the latest pnpm via corepack. When pnpm 10 shipped, the build started failing with:

ERR_PNPM_PREPARE_PKG_FAILURE  Error when preparing the package
 Blocked by policy: electron-to-chromium@1.5.134 is not allowed
 because it was released 0 days ago (policy: minimumReleaseAge=3 days)
Enter fullscreen mode Exit fullscreen mode

pnpm 10 introduced release-age security policies that refuse to install packages published within the last N days. A reasonable feature in production — a CI-breaking surprise when your lock file pins a package that was published yesterday.

Two separate policies hit us: minimumReleaseAge and ignored-builds (which blocks esbuild and vue-demi unless explicitly allowed). The package.json "pnpm" field that's supposed to configure these policies is silently ignored in pnpm 10 — it logs a warning and reads nothing.

The fix: pin to pnpm 9:

FROM node:22-slim
RUN corepack enable && corepack prepare pnpm@9.15.9 --activate
Enter fullscreen mode Exit fullscreen mode

pnpm 9 has no release-age policies. The upgrade to pnpm 10 can wait until the project has a proper CI environment to absorb the breaking change.

Bug 3: FastAPI container connecting to localhost instead of postgres

The backend started cleanly. Every database call returned:

asyncpg.exceptions.ConnectionRefusedError: connection refused (host 127.0.0.1, port 5432)
Enter fullscreen mode Exit fullscreen mode

The DATABASE_URL in .env was postgresql+asyncpg://numpath:numpath@localhost:5432/numpath. Inside a Docker Compose network, localhost is the container's own loopback — not the postgres service. The postgres container is reachable by its service name.

The fix: override the env var at the service level in docker-compose.yml:

backend:
  env_file: ../.env
  environment:
    DATABASE_URL: postgresql+asyncpg://numpath:numpath@postgres:5432/numpath
    REDIS_URL: redis://redis:6379/0
Enter fullscreen mode Exit fullscreen mode

The environment block wins over env_file, so local development (which uses localhost) keeps working. Containers talk to each other by service name.

Bug 4: SQLAlchemy column defaults not applied at construction time

This one cost the most time. POST /attempts returned a 500:

TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'
Enter fullscreen mode Exit fullscreen mode

The BKT update equation was subtracting from p_learn, which was None. The KCStateRecord model had:

class KCStateRecord(Base):
    p_learn: Mapped[float] = mapped_column(Float, default=0.3)
    p_guess: Mapped[float] = mapped_column(Float, default=0.2)
    p_slip:  Mapped[float] = mapped_column(Float, default=0.1)
Enter fullscreen mode Exit fullscreen mode

The bug: SQLAlchemy's default= is a server-side or flush-time default. When you construct KCStateRecord() in Python and haven't flushed to the database yet, those columns are None on the Python object. The domain code ran immediately after construction, before any flush.

The fix: set defaults explicitly in the constructor, then flush and refresh before returning:

record = KCStateRecord(
    student_id=student_id,
    skill_id=skill_id,
    p_mastery=0.1,
    p_learn=0.3,
    p_guess=0.2,
    p_slip=0.1,
    opportunity_count=0,
)
self._db.add(record)
await self._db.flush()         # write to DB so defaults are applied
await self._db.refresh(record) # re-read the DB-populated values
Enter fullscreen mode Exit fullscreen mode

The rule: if you use a newly constructed SQLAlchemy model object before any flush, assume every default= column is None. Either set defaults in the constructor or flush first.

What the BKT update looks like in practice

With those bugs cleared, the full attempt flow works end to end. A correct answer on a SUB_BORROW problem with a fresh KCState shows:

before: p_mastery=0.100, opportunity_count=0
after:  p_mastery=0.533, opportunity_count=1
Enter fullscreen mode Exit fullscreen mode

That 0.1 → 0.533 jump is the Bayesian update working — prior p_mastery combines with p_learn, corrected for p_guess and p_slip. The math is covered in detail in Bayesian Knowledge Tracing in 37 lines of Python.

Why It Matters for the Research

Phase 1's job was never to be elegant — it was to be instrumented. Every attempt record written to the attempts table is a training signal for Phase 2's BKT parameter estimation. We need ≥150 records (5 students × 3 sessions × 10+ problems) before Phase 2 can begin.

The bugs above are why research-grade software is harder than it looks. Each one silently corrupts data in a different way: password hashing fails outright (detectable), Docker networking fails silently on every write (detectable but subtle), SQLAlchemy defaults produce None BKT parameters (corrupts ML inputs, hard to detect in test data).

The fix for all of them is the same: run the full stack. Not unit tests. Not import my_function; print(my_function()). Start the containers, log in as a real user, and watch what happens.

What We Learned

The honest retrospective:

Seed data is harder than it looks. Writing 60 hand-crafted math problems at three difficulty levels takes longer than writing the adaptive engine. Every problem needs a machine-checkable answer, a hint, and a calibrated difficulty score.

Docker Compose env_file + environment is the right pattern. env_file carries the defaults; environment carries container-specific overrides. The pattern is obvious in hindsight and invisible until you need it.

The flush() + refresh() pattern is load-bearing for async SQLAlchemy. Any code that creates an ORM object and immediately passes it to domain logic needs an explicit flush. The async path doesn't auto-flush the way the synchronous ORM used to.

What's Next

Phase 2: BKT parameter estimation from real student data, and a mistake classifier that categorises subtraction errors beyond "wrong." The attempts table is waiting.

Key Takeaways

  • passlib is dead — use bcrypt directly; it's three functions and no transitive dependency risk
  • Docker Compose containers reach each other by service name, not localhost; override DATABASE_URL in the environment block rather than the env_file
  • SQLAlchemy default= columns are None on a freshly constructed Python object until after a flush() + refresh() — always set constructor defaults explicitly when domain code runs immediately after creation

Top comments (0)