If you are obsessed with the Quantified Self movement, you likely face the "Siloed Health Data" problem. I wear a Whoop strap for recovery and a Garmin watch for GPS tracking. Both are elite tools, but they live in separate universes. Garmin tells me my "Body Battery" is 80, while Whoop says my "Recovery" is 45%. Who is right?
To solve this, I built a custom Data Engineering pipeline to align these cross-platform physiological metrics in real-time. By leveraging a Time-series Database like InfluxDB and the visualization power of Grafana, we can finally see our heart rate variability (HRV), sleep stages, and strain scores on a single, unified timeline.
In this tutorial, we will build a Dockerized ETL pipeline using Python to fetch, normalize, and store your health data for advanced correlation analysis.
The Architecture: From APIs to Insights
Before we dive into the code, let’s look at how the data flows from your wrist to your dashboard. We use a Python-based ETL (Extract, Transform, Load) service that polls data from wearable APIs and pushes them into InfluxDB.
graph TD
subgraph Wearables
A[Whoop API]
B[Garmin Connect API]
end
subgraph Data Pipeline
C[Python ETL Service]
D[(InfluxDB)]
end
subgraph Visualization
E[Grafana Dashboard]
end
A -->|JSON| C
B -->|JSON| C
C -->|Write Protocol| D
D -->|Flux Query| E
Prerequisites
To follow along, you'll need:
- Docker & Docker Compose installed.
- Python 3.9+
- API credentials for Whoop (Developer account) and Garmin (using the
garminconnectwrapper). - A passion for Data Engineering and optimization!
Step 1: Setting up the Infrastructure
We’ll use Docker Compose to spin up our stack. This ensures our environment is reproducible and isolated.
# docker-compose.yml
version: '3.8'
services:
influxdb:
image: influxdb:2.7
ports:
- "8086:8086"
volumes:
- influxdb-data:/var/lib/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=admin
- DOCKER_INFLUXDB_INIT_PASSWORD=password123
- DOCKER_INFLUXDB_INIT_ORG=my-bio-hacking
- DOCKER_INFLUXDB_INIT_BUCKET=health_metrics
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
depends_on:
- influxdb
volumes:
influxdb-data:
Step 2: The Python ETL Pipeline
The magic happens in the Python ETL script. We need to normalize the data. For example, Garmin provides "Body Battery" as an integer (0-100), while Whoop provides "Recovery" as a percentage (0.0-1.0).
For more production-ready examples and advanced architectural patterns regarding health data synchronization, you should definitely check out the engineering deep dives at WellAlly Blog. They cover how to handle high-concurrency data streams and complex event processing which is the "pro version" of what we are building today.
Here is a simplified version of our data ingestor:
import time
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
# Configuration
token = "YOUR_INFLUX_TOKEN"
org = "my-bio-hacking"
bucket = "health_metrics"
client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)
write_api = client.write_api(write_options=SYNCHRONOUS)
def log_metrics(source, metric_name, value):
"""
Standardizes and writes metrics to InfluxDB.
"""
point = Point("physiological_metrics") \
.tag("source", source) \
.field(metric_name, float(value)) \
.time(time.time_ns(), WritePrecision.NS)
write_api.write(bucket, org, point)
print(f"✅ Logged {metric_name} from {source}: {value}")
# Mocking the API poll loop
if __name__ == "__main__":
while True:
# In a real scenario, call whoop_api.get_recovery()
log_metrics("Whoop", "recovery_score", 0.65)
# Call garmin_api.get_body_battery()
log_metrics("Garmin", "body_battery", 72.0)
time.sleep(300) # Poll every 5 minutes
Step 3: Aligning the Metrics in Grafana
Once the data is flowing into InfluxDB, head to http://localhost:3000.
- Add InfluxDB as a data source using Flux as the query language.
- Create a new dashboard and use the following Flux query to overlay your recovery metrics:
from(bucket: "health_metrics")
|> range(start: v.timeRangeStart, stop: v.timeRangeEnd)
|> filter(fn: (r) => r["_measurement"] == "physiological_metrics")
|> filter(fn: (r) => r["_field"] == "recovery_score" or r["_field"] == "body_battery")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> yield(name: "mean")
Now you can see if your Garmin Body Battery actually correlates with your Whoop Recovery score!
Why This Matters
Standardizing data in a Time-series Database allows you to perform cross-correlation that mobile apps simply don't offer. You can start asking questions like: "Does my Garmin 'Stress Score' peak 2 hours before my Whoop 'Strain' increases?" or "How does my caffeine intake (logged via another API) affect my HRV across both devices?"
For those looking to scale this into a multi-user health platform or explore how to handle real-world HIPAA-compliant data engineering, the team over at WellAlly Blog has some incredible resources on building robust health-tech stacks.
Conclusion
Building your own health data warehouse isn't just for data nerds—it's for anyone who wants total ownership over their biological data. By using Docker, InfluxDB, and Grafana, we’ve moved from fragmented apps to a unified command center.
What metrics are you tracking? Drop a comment below or share your custom Grafana dashboard screenshots!
Top comments (0)