Loop is my pet project. Pet projects are sandboxes where any sandcastles can be built with the fastest iteration cycle.
To learn new material, I always need real-life examples and practice.
If I read a book explaining what 3NF is, then the best way for me to understand it is to check some real, physical tables.
In general, knowledge is really valuable when applied to real work.
However, the pet project, as I said above, is my laboratory where I conduct experiments and put new ideas to initial tests.
I've needed my personal data warehouse dedicated to learning for a long time, and decided to start with the following idea:
- I have Garmin watches and I'm tracking time spent on work, pet projects, learning new things, and gaming
- So let's collect, model, and transform this data in a data warehouse built from scratch
- Then let's build some dashboards on top of this data
These are examples of questions on which I want to get the answers from time to time:
- How much time did I spend by day/week/month working, learning, or gaming?
- How many workouts did I have by day/week/month?
- When did I first start to workout? How consistent am I?
- What are my median/average bedtime and wake-up times by week/month?
I'm a Data Engineer, so my hammer will be used for the nails:
- S3 and Postgres for the raw and detailed layers
- Apache Spark for data transformations
- delta.io for the staging layer
- Dagster for the orchestration
- Metabase to build dashboards
In this and the following posts I'm going to tell how the thing was built.
Here's the final result – a dashboard which looks like this (screenshots from my Metabase):
You may see some flaky graphs like "Sleep Fall Asleep", I decided to leave it as is, otherwise I will never publish this post.
There is no limit to perfection.
Why "Loop"?
Don't confuse with Polar's Loop device!
I didn't have time to trademark the project name, so I'm sticking with "Loop" which sounds like "Whoop".
I used a Whoop band for nearly 9 months and loved it.
The device is impressive: it works more than 1-2 days(hey Apple Watch), doesn't have screen, and is solely focused on some magical "insights".
Anyway, now I have Garmin and I'm absolutely happy.
One day I'll implement something what Whoop does, like analyzing correlations between Garmin metrics and working hours.
I also plan to add Google Forms for daily mood and habits tracking in the future.
Sources
Let's start by investigating what data we can collect.
Time tracking – EARLY
I'm tracking my time. What I'm tracking: working time, time spent on pet projects, on reading technical books, on learning languages, playing video games.
I track only deep and focused work, fully commiting to what I'm doing in the moment.
If spend 8 hours in the office then 5-6 hours will be tracked.
It's just a metric. Maybe a large amount of hours can lead to better results, maybe not, there are no guarantees.
It's the same as possession in football, number of git commits made, HRV, or the "stress level".
Anyway, with this data, trends can be spotted:
- How much time I spent playing video games this week
- Time spent on a meetings or in the focusing mode
- I read a new book about databases, how much time it took
EARLY(formerly Timeular) is the time tracking app I use, paying for a subscription.
It's simple, minimal, and does 95% of what I expect.
One of the selling points for EARLY is an API.
They also sell Time Tracking Cube – physical cube (I have one),
which you can use if you want physical feedback on tracking.
This is my current folders and activities layout.
The hierarchy is rather simple and works perfectly for me, covering everything I'm doing on a daily basis.
To get data from the API I will implement simple HTTP client with pydantic validation.
What we will download:
- Folders
- Activities
- Tags
- Time entries
Workouts, sleep, steps and many more – Garmin watches
I'm a proud owner of a Forerunner 265.
These watches are the best I've ever had.
The top feature for me is the battery life – 5-7 days.
Garmin collects a lot of metrics, but as a starting point we will focus on:
- Sleep
- Workouts
To download Garmin data I'll use the garminconnect Python library.
Logical model
This section heavily uses concepts from the Database Design Book written by Alexey Makhotkin, which I read with pleasure before creating the logical model for the project. The book is highly recommended reading. You might also be interested in the author's articles and author's Substack Minimal Modeling.
Also, I'm writing a review of the book.
You can notice I use lots of links to Alexey's articles throughout this post because they're quite useful and the author covers many data modeling concepts.
Quote from the book: "Logical model exists even if you refuse to acknowledge it. When not written explicitly it's scattered in 3 places: 1) physical db schema 2) system code 3) people's memory."
The Logical Model should become the single source of truth for my data warehouse.
If I want to build a new feature, I'll start with the Logical Model first and then move to the Physical Implementation.
You may think this is redundant bureaucracy for the pet project, but I try not to be "tactical tornado" here.
After checking the EARLY API and garminconnect library for what data they provide, we can extract anchors, attributes, and links for the logical model.
Anchors
We start from extracting entities or anchors (nouns).
I also added the "source" column so it's clear where the data comes from.
Name | ID examples | Table name (physical) | Sources |
---|---|---|---|
Time Folder | "280326" | time_folder | EARLY api |
Time Activity | "2052354" | time_activity | EARLY api |
Time Tag | 14831771 | time_tag | EARLY api |
Time Entry | "106227041" | time_entry | EARLY api |
Workout Type | 21 | workout_type | Garmin Connect |
Workout | 20150174780 | workout | Garmin Connect |
Sleep | 1752277343000 | sleep | Garmin Connect |
Attributes
I extracted this minimal set of attributes by answering questions.
Anchor | Question | Logical Data Type | Example | Column name and type (physical) |
---|---|---|---|---|
Time Folder | What is the name of the folder? | string | "0. Vio" | time_folder.name TEXT NOT NULL |
Time Folder | What is the description of the folder? | string | "Work" | time_folder.description TEXT NOT NULL |
Time Activity | What is the name of the activity? | string | "Gaming" | time_activity.name TEXT NOT NULL |
Time Activity | What is the status of the activity? | enum | "active", "inactive", "archived" | time_activity.status ENUM NOT NULL |
Time Activity | What is the description of the activity? | string | "Playing video games" | time_activity.description TEXT NOT NULL |
Time Tag | What is the label of the tag? | string | "crafting_interpreters" | time_tag.label TEXT NOT NULL |
Time Entry | When the entry was started? | timestamp in UTC | "2025-08-16 06:08:43+00:00" | time_entry.start_at TIMESTAMPTZ NOT NULL |
Time Entry | When the entry was ended? | timestamp in UTC | "2025-08-16 07:22:27+00:00" | time_entry.end_at TIMESTAMPTZ NOT NULL |
Workout Type | What is the name of the type? | string | "yoga" | workout_type.name TEXT NOT NULL |
Workout | What is the name of the workout? | string | "Yoga" | workout.name TEXT NOT NULL |
Workout | When the workout started? | timestamptz | "2025-08-23 10:19:05+02:00" | workout.start_at TIMESTAMPTZ NOT NULL |
Workout | When the workout ended? | timestamptz | "2025-08-23 10:31:56+02:00" | workout.end_at TIMESTAMPTZ NOT NULL |
Sleep | When the sleep started? | timestamptz | "2025-08-22 21:49:46+02:00" | sleep.start_at TIMESTAMPTZ NOT NULL |
Sleep | When the sleep ended? | timestamptz | "2025-08-23 05:57:46+02:00" | sleep.end_at TIMESTAMPTZ NOT NULL |
Links
Let's define relations between anchors.
Anchor1:Anchor2 | Cardinality | Sentences | Table or column name (physical) |
---|---|---|---|
Time Folder : Time Activity | 1:M | Time Folder contains multiple Time Activities. Time Activity is part of only one Time Folder | time_activity.time_folder_id |
Time Folder : Time Tag | 1:M | Time Folder contains multiple Time Tags. Time Tag is part of only one Time Folder | time_tag.time_folder_id |
Time Activity : Time Entry | 1:M | Time Activity contains multiple Time Entries. Time Entry is part of only one Time Activity | time_entry.time_activity_id |
Time Entry : Time Tag | M:N | Time Entry is labeled with multiple Time Tags. Time Tag can label multiple Time Entries | link_time_entry__time_tag |
Workout Type : Workout | 1:M | Workout Type contains multiple Workouts. Workout can be only one of Workout Type | workout.workout_type_id |
Physical ERD diagram
I use d2 to visualize the ERD diagram. It's a diagramming language that supports
SQL tables and Crow's foot notation.
I recommend reading these articles written by Alexey:
I didn't add ERD notation column to the logical design though.
This is a d2 code snippet showing how this diagram is built:
time_folder: {
shape: sql_table
id: text {constraint: primary_key}
name: text not null
description: text not null
}
time_activity: {
shape: sql_table
id: text {constraint: primary_key}
name: text not null
status: enum not null
description: text not null
time_folder_id: text {constraint: foreign_key}
}
time_activity.time_folder_id <-> time_folder.id: {
source-arrowhead.shape: cf-many
target-arrowhead.shape: cf-one-required
}
# ...
Note that we use Foreign Keys in the design. Later we will relax this requirement.
Why this should be relaxed is highlighted in the Foreign Keys @ Alexey Makhotkin article.
The problem is that we need to load two connected entities in a single transaction.
If we reload folders by simply overwriting the table, we truncate it, meaning all related articles and tags that depend on folders will be deleted.
If there is a failure loading the folders, then the activities and tags tables will be empty.
Summary
What do we have so far:
- Data sources were explored (EARLY and Garmin)
- Logical model was built with anchors, attributes, and links
- Physical design was created and ERD diagram was drawn
In the next post I will tell about DWH layers and raw data ingestion. Stay tuned and subscribe!
Top comments (0)