DEV Community

Dora
Dora

Posted on

Tracking Video Progress with Bitmaps

Tracking Video Progress with Bitmaps

Problem

Most LMS platforms track video completion as percentage or "watched/not watched".
This doesn't tell you:

  • Which parts were actually watched vs skipped
  • Which segments were rewatched
  • Precise viewing patterns

Solution

Use PostgreSQL's varbit (variable-length bit string) where each bit represents one second of video.

  • Bit = 1: watched
  • Bit = 0: not watched

Screenshot

Orange: watched, Gray: unwatched

Why Bitmap?

Performance

Bitwise OR for merging:

existing_bits | new_bits
Enter fullscreen mode Exit fullscreen mode
  • Single CPU instruction
  • No loops or conditionals
  • All edge cases handled automatically

BIT_COUNT for analytics:

BIT_COUNT(watch_bits)
Enter fullscreen mode Exit fullscreen mode
  • PostgreSQL native function
  • Fast calculation of watched seconds

Example:

Existing: 11110000
New:      00001111
Result:   11111111  (single operation merge)
Enter fullscreen mode Exit fullscreen mode

Rewatching same segment? OR operation is idempotent.
Skip and come back? Handled automatically.
Multiple sessions? Just keep OR-ing.

Storage Efficiency

2-hour video = 7,200 bits = 900 bytes
With gzip: ~100-200 bytes (depends on watch pattern)

Compare to alternatives:

  • Timestamp array: hundreds of integers = kilobytes
  • Time ranges: complex merge logic + storage overhead

Alternatives Considered

  1. Timestamp array

    • Storage: Large
    • Merge: Complex iteration
    • Query: Slow
  2. Time ranges

    • Storage: Medium
    • Merge: Complex overlap handling
    • Query: Range intersection logic
  3. Bitmap

    • Storage: Compact
    • Merge: Single OR operation
    • Query: Native bit functions

Implementation

Model

class VarBitField(Field):
    def db_type(self, connection: BaseDatabaseWrapper):
        return "varbit"

class Watch(TimeStampedMixin):
    user = ForeignKey(User, CASCADE, verbose_name=_("User"))
    media = ForeignKey(Media, CASCADE, verbose_name=_("Media"))
    last_position = FloatField(_("Last Position"))
    watch_bits = VarBitField(_("Watch Bits"), blank=True, null=True)
    rate = FloatField(_("Watch Rate"))
    passed = BooleanField(_("Passed"), default=False)
    context = CharField(_("Context Key"), max_length=255, blank=True, default="")
Enter fullscreen mode Exit fullscreen mode

Update Logic

  1. Handle different bitmap lengths (resize if needed)
  2. Merge new bits with existing using OR operation
  3. Calculate watch rate from bit count
  4. Determine pass/fail based on threshold
@classmethod
async def update_media_watch(
    cls, *, media_id: str, user_id: str, context: str, last_position: float, watch_bits: str | None
):
    def _execute_update():
        if watch_bits is None:
            cls.objects.update_or_create(
                media_id=media_id, user_id=user_id, context=context, defaults={"last_position": last_position}
            )
            return

        if not watch_bits or set(watch_bits) - {"0", "1"}:
            raise ValueError("watch_bits must be a non-empty bit string")

        bit_length = len(watch_bits)
        bit_count = watch_bits.count("1")

        table = cls._meta.db_table
        media_table = Media._meta.db_table

        sql = f"""
            WITH
            input_bits AS (
                SELECT '{watch_bits}'::varbit AS bits
            ),
            media_info AS (
                SELECT passing_point
                FROM {media_table}
                WHERE id = %(media_id)s
            ),
            existing AS (
                SELECT watch_bits FROM {table} 
                WHERE user_id = %(user_id)s AND media_id = %(media_id)s AND context = %(context)s
            ),
            merged AS (
                SELECT
                    CASE
                        WHEN e.watch_bits IS NULL THEN
                            i.bits
                        WHEN LENGTH(e.watch_bits) < %(bit_length)s THEN
                            RPAD(e.watch_bits::text, %(bit_length)s, '0')::varbit | i.bits
                        WHEN LENGTH(e.watch_bits) > %(bit_length)s THEN
                            SUBSTRING(e.watch_bits, 1, %(bit_length)s) | i.bits
                        ELSE
                            e.watch_bits | i.bits
                    END AS bits
                FROM input_bits i
                LEFT JOIN existing e ON true
            ),
            final AS (
                SELECT bits, BIT_COUNT(bits) AS bit_count
                FROM merged
            )
            INSERT INTO {table} (
                user_id, media_id, context, watch_bits, rate, passed, last_position, created, modified
            )
            VALUES (
                %(user_id)s,
                %(media_id)s,
                %(context)s,
                (SELECT bits FROM input_bits),
                %(rate)s,
                %(rate)s >= (SELECT passing_point FROM media_info),
                %(last_position)s,
                NOW(),
                NOW()
            )
            ON CONFLICT (user_id, media_id, context)
            DO UPDATE SET
                watch_bits = (SELECT bits FROM final),
                rate = (SELECT bit_count FROM final) * 100.0 / %(bit_length)s,
                passed = ((SELECT bit_count FROM final) * 100.0 / %(bit_length)s) >= (SELECT passing_point FROM media_info),
                last_position = EXCLUDED.last_position,
                modified = NOW();
        """

        params = {
            "media_id": media_id,
            "user_id": user_id,
            "context": context,
            "last_position": last_position,
            "rate": bit_count * 100.0 / bit_length,
            "bit_length": bit_length,
        }

        with connection.cursor() as cursor:
            cursor.execute(sql, params)

    await sync_to_async(_execute_update, thread_sensitive=True)()
Enter fullscreen mode Exit fullscreen mode

Key operations:

  • e.watch_bits | i.bits - Bitwise OR merge
  • BIT_COUNT(bits) - Count watched seconds
  • RPAD/SUBSTRING - Automatic length handling

Data Transfer

Bitmaps can be large (2-hour video = 7200 bits).
Compress with gzip before sending over network.

class WatchOutSchema(Schema):
    last_position: float
    watch_bits: Annotated[GzipOutEncodedType | None, Field(None, description="Gzip compressed Bit String")]

class WatchInSchema(Schema):
    last_position: float
    watch_bits: Annotated[GzipInEncodedType, Field(None, description="Gzip compressed Bit String")]
Enter fullscreen mode Exit fullscreen mode
def gzip_compress(data: bytes, level: int = 9):
    return gzip.compress(data, compresslevel=level)

def gzip_decompress(data: bytes):
    return gzip.decompress(data)

class GzipInEncoder(EncoderProtocol):
    @classmethod
    def decode(cls, data: bytes):
        decoded_base64 = base64.b64decode(data)
        decompressed = gzip_decompress(decoded_base64)
        return decompressed

    @classmethod
    def encode(cls, value: bytes):
        return value

    @classmethod
    def get_json_format(cls):
        return ""

GzipInEncodedType = Annotated[str, EncodedStr(encoder=GzipInEncoder)]

class GzipOutEncoder(EncoderProtocol):
    @classmethod
    def decode(cls, data: bytes):
        return data

    @classmethod
    def encode(cls, value: bytes):
        compressed = gzip_compress(value)
        encoded_base64 = base64.b64encode(compressed)
        return encoded_base64

    @classmethod
    def get_json_format(cls):
        return ""

GzipOutEncodedType = Annotated[str, EncodedStr(encoder=GzipOutEncoder)]
Enter fullscreen mode Exit fullscreen mode

API

@router.post("/media/{id}/watch")
async def update_media_watch(request: HttpRequest, id: str, data: WatchInSchema):
    await Watch.update_media_watch(
        media_id=id,
        user_id=request.auth,
        context=request.active_context,
        last_position=data.last_position,
        watch_bits=data.watch_bits,
    )
Enter fullscreen mode Exit fullscreen mode

Results

  • Second-by-second accurate tracking
  • Edge cases (pause, skip, rewatch) handled without special code
  • Compact storage for long videos
  • Fast queries using PostgreSQL bit operations

Example

Video: 100 seconds
User watches: 0-30s, skips to 60s, watches 60-80s

Bitmap

1111111111111111111111111111110000000000000000000000000000001111111111111111111100000000000000000000
Enter fullscreen mode Exit fullscreen mode

Watch rate: 50/100 = 50%

The full implementation

https://github.com/cobel1024/minima

Top comments (0)