I built recovery-trail, a small local-first Apple Health viewer that answers one specific training question:
Should I push today, or should I pull back?
The part I like most is what it does not use.
There is no backend. There is no account. There is no upload. There is no LLM reading your health data and inventing a confident-sounding paragraph.
It is just a browser app, a web worker, some trend math, and a rule trace.
You drop in an Apple Health export.xml, it parses HRV, resting heart rate, sleep, and workout load locally, then it produces a verdict: standard, caution, or deload.
The app is live here:
https://conalh.github.io/recovery-trail/
The source is here:
https://github.com/Conalh/recovery-trail
There is also a sample-data button, so you can try it without using your own export.
Why not use an LLM?
I like LLMs. I use them a lot. This was not a good place for one.
Recovery data is already noisy. HRV can bounce around. Sleep data can be wrong. Workout load can look scary if you only look at one week. If the final layer is also probabilistic, you now have noise on top of noise.
For this project, I wanted the opposite:
- same input, same verdict
- every fired rule visible
- every threshold inspectable
- no natural-language explanation that hides the actual reason
- no model touching private health data
So the app is closer to a tiny rules engine than a chatbot.
The UI can still produce a narrative line, but the verdict itself comes from deterministic logic gates:
- did HRV trend down?
- did resting HR trend up?
- did sleep drop?
- did acute workload spike relative to chronic load?
- did enough signals fire across enough metrics to call it a recovery stack issue?
That last phrase, "logic gates," is the part that makes it fun. The app does not "feel" like it works because it generated a plausible answer. It works because the pieces line up: signal in, math applied, rule fired, evidence shown.
The data stays in the browser
Apple Health exports can be huge. A real export.xml can be hundreds of megabytes. I did not want the app to freeze while reading it, and I definitely did not want to upload that file anywhere.
So parsing happens in a web worker.
The worker streams the file with file.stream().pipeThrough(new TextDecoderStream()), keeps a sliding text buffer, and scans only the tags the app needs:
HKQuantityTypeIdentifierHeartRateVariabilitySDNNHKQuantityTypeIdentifierRestingHeartRateHKCategoryTypeIdentifierSleepAnalysisWorkout
That gives the UI progress updates while the worker extracts just enough structure for the rule layer:
- HRV samples
- resting heart rate samples
- sleep intervals
- workout durations
It is intentionally boring architecture. The file never leaves the page. The worker returns a normalized in-memory object. The main React app evaluates it and renders the briefing.
The core idea: levels and trends are different
One thing I wanted to avoid was treating every metric as a simple red/yellow/green status.
For recovery, a level and a trend are different signals.
Example:
- HRV can be below baseline right now.
- HRV can also be trending downward even if it has not crossed a scary threshold yet.
Those should not be collapsed into the same check.
So the app runs both:
- level rules: where is the metric now?
- trend rules: where is the metric going?
Level rules catch things like:
- 7-day HRV below a 28-day baseline
- 7-day resting HR above a 28-day baseline
- sleep below target
- acute:chronic workload ratio too high
Trend rules catch direction of travel.
The trend logic is ported from my trainer-facing companion project, fit-ontology, and uses two windows side by side.
The dual-window trend detector
Each recovery signal runs two slope estimators:
- Acute: 7-day ordinary least squares slope on the raw daily series.
- Chronic: 28-day EWMA-smoothed series, then ordinary least squares on that smoothed line.
Both slopes are normalized by the 28-day baseline standard deviation, so the threshold is in SD/day rather than raw units.
That matters because "2 ms of HRV per day" and "2 bpm of resting heart rate per day" should not be interpreted on the same raw scale. Normalizing makes the trend detector speak in relative movement against the user's own baseline.
The 7-day detector is responsive but noisy. The 28-day detector is slower but better at catching sustained drift.
Then a combiner resolves the two:
- acute-only fires: demote one band
- chronic-only severe fires: surface as mild
- chronic stronger than acute: promote one band
- chronic confirms acute: trust the acute band
- neither fires: no trend rule
That gives the app a useful personality: it reacts to sharp changes, but it does not panic just because one short window got weird.
There is one more safety check. If the composite recovery score is still high, trend signals get demoted again. In other words, if the level picture is excellent, borderline trend math should not dominate the final verdict.
That is the kind of rule I like because it is explicit. You can argue with it. You can tune it. You can test it. It is not hidden behind a prompt.
The verdict is just the maximum fired severity
After the rules run, the final verdict is simple:
- no serious rules:
standard - caution-level rule:
caution - deload-level rule:
deload
If enough rules fire across enough different metrics, the app synthesizes a meta-rule:
Recovery stack is down across the board.
That rule is not mystical. It only appears when at least three rules fire across at least three metrics. It is just a framing layer so the UI can say, "this is not one isolated marker."
The important part is that the rule list stays visible. A verdict without evidence is not very useful. A verdict with every fired rule, window badge, and slope value is something you can actually inspect.
The interface is a briefing, not a dashboard dump
The main view is intentionally compact.
Four rows:
- HRV
- resting HR
- sleep
- load
Each row shows daily cells against baseline. Teal means better than baseline. Rust means worse. The right side shows today's value and percentage change.
Below the heatmap, the app writes a short narrative line like:
Through 5/19 everything was at baseline. Then all four metrics rolled over at once and stayed there.
Then it lists the rules that fired.
Each trend rule shows whether the 7-day detector, 28-day detector, or both fired. It also prints the raw SD/day slope numbers. Tapping a rule focuses the relevant metric row. Tapping a metric expands it into a small SVG line chart with the baseline overlay.
There are no charting dependencies. The sparklines and metric chart are hand-rolled SVG because the data shape is narrow and the interaction is specific.
The project is built with:
- Vite
- React 19
- TypeScript
- Tailwind
- a single web worker
- zero backend
- zero analytics
- zero tracking
What worked out
The satisfying part is that the deterministic approach still feels alive.
The sample data is synthetic, but it is shaped like a real recovery slide: HRV down, resting HR up, sleep down, and load elevated. The app catches the rollover and produces a deload verdict. More importantly, it shows why.
That is the win for me.
Not "AI says you should rest."
More like:
HRV is off baseline across both the 7-day and 28-day windows. Resting HR is rising in the acute window. Sleep is below target. Multiple metrics are down at once.
That is less magical and more useful.
Things I would still improve
The current version is exploratory. It is not medical advice, and it is not trying to prescribe training for everyone.
The next pieces I would like to improve are:
- more fixture coverage around the trend combiner
- better handling for sparse Apple Health exports
- clearer language for "standard" days, not just bad days
- optional export/share of the rule trace
- more tuning against real-world edge cases
There is also a small lint cleanup left in the repo right now, even though the production build passes.
Try it
Live app:
https://conalh.github.io/recovery-trail/
Source:
https://github.com/Conalh/recovery-trail
Click "Try with sample data" if you just want to see the briefing.
If you use your own Apple Health export, the data stays local. The browser parses it, the rules run client-side, and the result is just a visible trail of math.
That is the whole point of the project: not a black box, not a model, just an explainable recovery trail.

Top comments (0)