DEV Community

buildbasekit
buildbasekit

Posted on

I’m crash testing FiloraFS-Lite under load (p95, pressure, failures)

I started running crash tests on FiloraFS-Lite to see how it actually behaves under pressure.

Not benchmarks. Not ideal conditions.
Real stress.

The focus is simple:

  • where it starts breaking
  • how early signals show up (p95, pressure)
  • what fails first under sustained load

What I’m testing right now

  • increasing RPM until the system shows pressure
  • tracking how latency (p95) degrades
  • observing write pressure under continuous load

I already have logs from initial runs.

But instead of rushing conclusions, I’m validating signals properly before sharing anything.

No assumptions. Just observed behavior.


Why this matters

Small-scale tests looked fine.

But real systems don’t fail in clean scenarios.

They fail when:

  • load spikes
  • resources get constrained
  • edge cases stack together

That’s the environment I’m trying to simulate.


What I’ll share next

Once analysis is done, I’ll publish:

  • what broke first
  • early warning signals
  • what actually mattered vs noise
  • what needs to change

If you’ve done similar crash or stress testing:

What signal usually shows up first for you under load?

Top comments (2)

Collapse
 
buildbasekit profile image
buildbasekit

Ran a 10 min load test before pushing further.

Interesting part: p95 latency started drifting earlier than expected, even though overall system looked stable. Write pressure also kept building quietly in the background.

Now moving into full crash tests to see where it actually breaks and whether these signals consistently show up beforehand.

Collapse
 
buildbasekit profile image
buildbasekit

Update from crash test runs: clear pattern emerging now.

p95 stays stable (~250–300ms) up to ~1500 RPM, then starts degrading rapidly as load increases. By ~6000+ RPM, latency crosses 1s+ and spikes toward ~1.8s, even though there are still zero errors.

Bottleneck is showing up in disk I/O (upload path), while other APIs remain stable. So the system doesn’t “fail” first, it just slows down heavily under write pressure.

This is interesting because the failure signal is purely latency-driven, not resource exhaustion or errors.

I’ll publish a detailed breakdown soon with full analysis, graphs, and what actually caused the tipping point.