Tracking Video Progress with Bitmaps
Problem
Most LMS platforms track video completion as percentage or "watched/not watched".
This doesn't tell you:
- Which parts were actually watched vs skipped
- Which segments were rewatched
- Precise viewing patterns
Solution
Use PostgreSQL's varbit (variable-length bit string) where each bit represents one second of video.
- Bit = 1: watched
- Bit = 0: not watched
Screenshot
Orange: watched, Gray: unwatched
Why Bitmap?
Performance
Bitwise OR for merging:
existing_bits | new_bits
- Single CPU instruction
- No loops or conditionals
- All edge cases handled automatically
BIT_COUNT for analytics:
BIT_COUNT(watch_bits)
- PostgreSQL native function
- Fast calculation of watched seconds
Example:
Existing: 11110000
New: 00001111
Result: 11111111 (single operation merge)
Rewatching same segment? OR operation is idempotent.
Skip and come back? Handled automatically.
Multiple sessions? Just keep OR-ing.
Storage Efficiency
2-hour video = 7,200 bits = 900 bytes
With gzip: ~100-200 bytes (depends on watch pattern)
Compare to alternatives:
- Timestamp array: hundreds of integers = kilobytes
- Time ranges: complex merge logic + storage overhead
Alternatives Considered
-
Timestamp array
- Storage: Large
- Merge: Complex iteration
- Query: Slow
-
Time ranges
- Storage: Medium
- Merge: Complex overlap handling
- Query: Range intersection logic
-
Bitmap
- Storage: Compact
- Merge: Single OR operation
- Query: Native bit functions
Implementation
Model
class VarBitField(Field):
def db_type(self, connection: BaseDatabaseWrapper):
return "varbit"
class Watch(TimeStampedMixin):
user = ForeignKey(User, CASCADE, verbose_name=_("User"))
media = ForeignKey(Media, CASCADE, verbose_name=_("Media"))
last_position = FloatField(_("Last Position"))
watch_bits = VarBitField(_("Watch Bits"), blank=True, null=True)
rate = FloatField(_("Watch Rate"))
passed = BooleanField(_("Passed"), default=False)
context = CharField(_("Context Key"), max_length=255, blank=True, default="")
Update Logic
- Handle different bitmap lengths (resize if needed)
- Merge new bits with existing using OR operation
- Calculate watch rate from bit count
- Determine pass/fail based on threshold
@classmethod
async def update_media_watch(
cls, *, media_id: str, user_id: str, context: str, last_position: float, watch_bits: str | None
):
def _execute_update():
if watch_bits is None:
cls.objects.update_or_create(
media_id=media_id, user_id=user_id, context=context, defaults={"last_position": last_position}
)
return
if not watch_bits or set(watch_bits) - {"0", "1"}:
raise ValueError("watch_bits must be a non-empty bit string")
bit_length = len(watch_bits)
bit_count = watch_bits.count("1")
table = cls._meta.db_table
media_table = Media._meta.db_table
sql = f"""
WITH
input_bits AS (
SELECT '{watch_bits}'::varbit AS bits
),
media_info AS (
SELECT passing_point
FROM {media_table}
WHERE id = %(media_id)s
),
existing AS (
SELECT watch_bits FROM {table}
WHERE user_id = %(user_id)s AND media_id = %(media_id)s AND context = %(context)s
),
merged AS (
SELECT
CASE
WHEN e.watch_bits IS NULL THEN
i.bits
WHEN LENGTH(e.watch_bits) < %(bit_length)s THEN
RPAD(e.watch_bits::text, %(bit_length)s, '0')::varbit | i.bits
WHEN LENGTH(e.watch_bits) > %(bit_length)s THEN
SUBSTRING(e.watch_bits, 1, %(bit_length)s) | i.bits
ELSE
e.watch_bits | i.bits
END AS bits
FROM input_bits i
LEFT JOIN existing e ON true
),
final AS (
SELECT bits, BIT_COUNT(bits) AS bit_count
FROM merged
)
INSERT INTO {table} (
user_id, media_id, context, watch_bits, rate, passed, last_position, created, modified
)
VALUES (
%(user_id)s,
%(media_id)s,
%(context)s,
(SELECT bits FROM input_bits),
%(rate)s,
%(rate)s >= (SELECT passing_point FROM media_info),
%(last_position)s,
NOW(),
NOW()
)
ON CONFLICT (user_id, media_id, context)
DO UPDATE SET
watch_bits = (SELECT bits FROM final),
rate = (SELECT bit_count FROM final) * 100.0 / %(bit_length)s,
passed = ((SELECT bit_count FROM final) * 100.0 / %(bit_length)s) >= (SELECT passing_point FROM media_info),
last_position = EXCLUDED.last_position,
modified = NOW();
"""
params = {
"media_id": media_id,
"user_id": user_id,
"context": context,
"last_position": last_position,
"rate": bit_count * 100.0 / bit_length,
"bit_length": bit_length,
}
with connection.cursor() as cursor:
cursor.execute(sql, params)
await sync_to_async(_execute_update, thread_sensitive=True)()
Key operations:
-
e.watch_bits | i.bits- Bitwise OR merge -
BIT_COUNT(bits)- Count watched seconds - RPAD/SUBSTRING - Automatic length handling
Data Transfer
Bitmaps can be large (2-hour video = 7200 bits).
Compress with gzip before sending over network.
class WatchOutSchema(Schema):
last_position: float
watch_bits: Annotated[GzipOutEncodedType | None, Field(None, description="Gzip compressed Bit String")]
class WatchInSchema(Schema):
last_position: float
watch_bits: Annotated[GzipInEncodedType, Field(None, description="Gzip compressed Bit String")]
def gzip_compress(data: bytes, level: int = 9):
return gzip.compress(data, compresslevel=level)
def gzip_decompress(data: bytes):
return gzip.decompress(data)
class GzipInEncoder(EncoderProtocol):
@classmethod
def decode(cls, data: bytes):
decoded_base64 = base64.b64decode(data)
decompressed = gzip_decompress(decoded_base64)
return decompressed
@classmethod
def encode(cls, value: bytes):
return value
@classmethod
def get_json_format(cls):
return ""
GzipInEncodedType = Annotated[str, EncodedStr(encoder=GzipInEncoder)]
class GzipOutEncoder(EncoderProtocol):
@classmethod
def decode(cls, data: bytes):
return data
@classmethod
def encode(cls, value: bytes):
compressed = gzip_compress(value)
encoded_base64 = base64.b64encode(compressed)
return encoded_base64
@classmethod
def get_json_format(cls):
return ""
GzipOutEncodedType = Annotated[str, EncodedStr(encoder=GzipOutEncoder)]
API
@router.post("/media/{id}/watch")
async def update_media_watch(request: HttpRequest, id: str, data: WatchInSchema):
await Watch.update_media_watch(
media_id=id,
user_id=request.auth,
context=request.active_context,
last_position=data.last_position,
watch_bits=data.watch_bits,
)
Results
- Second-by-second accurate tracking
- Edge cases (pause, skip, rewatch) handled without special code
- Compact storage for long videos
- Fast queries using PostgreSQL bit operations
Example
Video: 100 seconds
User watches: 0-30s, skips to 60s, watches 60-80s
Bitmap
1111111111111111111111111111110000000000000000000000000000001111111111111111111100000000000000000000
Watch rate: 50/100 = 50%

Top comments (0)