DEV Community: Arnab Datta

Rapid Interest Shifts in Recommender Systems: A Case Study on Instagram Reels

Arnab Datta — Thu, 16 Apr 2026 08:09:43 +0000

A late-night experiment revealing how fast recommendation systems actually adapt

An informal, timestamped experiment showing how quickly Instagram's recommendation system adapts to new inputs — often within minutes — and what that reveals about modern recommender systems.

Key Observations (TL;DR)

Feed adaptation latency: ~2 minutes consistently across genres
Subgenre-level clustering observed (not just category-level)
Content classification appears independent of hashtags
Cross-user candidate pool overlap observed

The Accidental Beginning

This wasn't supposed to be any glorified research. It started as harrasement.

I was chatting with a friend late at night and casually sent her a reel — a (golgappa) street food video, nothing special. Eight minutes later she sent one back. Then another. Then at 11:29 PM she messaged me: that her entire feed is just food now. Which, to be fair, was the objective.

I laughed. Then I got curious. Then I got obsessive about it.

What followed was a highly controlled two-hour experimental session (i.e., I spammed her with reels), across completely different genres — food, coding, anime, gaming, gym, Harry Potter — and timing exactly how fast her feed shifted each time. She was a very willing participant (trust me).

What I found was genuinely surprising, and it lines up in interesting ways with what we know about how Instagram's recommendation system actually works under the hood.

Note on methodology: This wasn't a controlled lab experiment. We were actively chatting throughout, her app was in normal use the whole time, and I wasn't running any formal measurement tools. These are real timestamps from our chat logs. I'd call it a naturalistic informal experiment — messy, but honest. I think that actually makes it more interesting.

The Friend Factor: Why Her Feed and Not Mine

Before the timeline, one thing worth noting: the same reels I sent her barely affected my own feed. My feed stayed mostly stable throughout the night. Hers was rewriting itself every few minutes.

This asymmetry is the most interesting starting observation. We're both active users on old accounts. So why was she so much more "algorithmically reactive"?

Toward the end of our (very productive) experiment she started sending me reels from her shifting feed — but almost none of it migrated to mine. The reels she sent me just didn't move the needle. Same input, completely different output.

My hypothesis — which I'll come back to later — is what I'm calling low engagement inertia. But first, the meticulously observed data.

The Timeline

Here's what actually happened, timestamped from our chat:

Phase 1 — Food (10:56 PM to 11:29 PM)

Time	What I Did	What Happened
10:56 PM	Sent her a golgappa reel	—
11:04 PM	—	She sends a golgappa reel back
11:11 PM	—	She sends more golgappa reels
11:12 PM	Sent 2–3 more food reels	—
11:20 PM	—	She starts sending food reels unprompted
11:29 PM	—	"Her full feed is just food"

33 minutes from first reel → complete feed takeover.

I felt like a god, able to manipulate feed at will.

Phase 2 — The Concept Test (12:12 AM to 12:33 AM)

This is where things got interesting.

At 12:12 AM I sent her a specific meme format — the kind where one person says "if no one told you today, you're such a good mother" and then the account creator reacts with something like "bro I became a mother just scrolling." Just a joke between us. Not a genre reel, no obvious category.

At 12:18 AM — six minutes later — she sent me a reel with the exact same first half — same video — but a completely different reaction from a different creator.

The behavior suggests the system may be matching structural patterns in content, not just hashtags or genre labels. Two reels, different creators, same comedic template. That's a surprisingly granular level of content understanding.

Then at 12:30 AM I sent a coding reel. By 12:33 AM she was sending me back multiple coding reels.

Then something genuinely weird happened. Around the same time we'd both been served a reel far from coding, a near-identical reel appeared on both our feeds simultaneously — same creator, same video, but with different audio and a different concept.

Well that was not supposed to happen.

We'd apparently both been pulled from the same creator's content pool at the same moment — probably because of candidate generation overlap when two users get freshly classified into similar interest clusters.

Phase 3 — Pushing the Limits (1:01 AM to 1:17 AM)

By now I had fully abandoned the pretense that this was a normal conversation and was just running (very productive) experiments on my willingly participated friend's feed at 1 AM, to see how consistent these shifts could get and how much will entirely new content do to her feed.

Time	What I Did	What Happened
1:01 AM	Sent Valorant reel	—
1:02 AM	Sent Your Lie in April anime reel	—
1:03 AM	—	She sends a PUBG reel (~2 min after gaming reel)
1:03 AM	—	She sends I Want to Eat Your Pancreas reel (same sad/romantic anime concept)
1:03 AM	—	She sends a Your Lie in April reel specifically — with my like on it
1:04 AM	—	FF reel, then continuous anime + gaming content
1:08 AM	Sent a gym reel (sub-100 likes — not viral)	—
1:10 AM	—	She sends gym reel
1:15 AM	Sent Harry Potter edit reel	—
1:17 AM	—	She sends 2 Harry Potter reels

The consistent pattern: ~2 minutes from send to feed shift, across completely unrelated genres, back to back. At this point, the system wasn’t reacting. It was anticipating.

The anime response deserves special attention. I sent a Your Lie in April reel. Within 2 minutes she received:

A different anime with the same emotional subgenre (sad, romantic)
Then the exact same anime I sent

This is consistent with subgenre-level clustering — the system appears to track not just "anime" as a category, but the emotional and stylistic signature within it.

The No-Hashtag Reel

For the gym test I specifically chose a reel with under 100 likes — not viral, not trending, just a guy doing lunges in mediocre lighting. It worked anyway.

One of the gaming reels that appeared on her feed after my Valorant send had zero hashtags. No caption, no tags, no metadata hints whatsoever.

The system still correctly categorized it as gaming content and served it within the 2-minute window. This is consistent with Instagram using visual and audio content analysis rather than relying solely on metadata — it wasn't reading the label. It was watching the video.

Why This Happens: The Technical Explanation

Based on publicly available information about how Meta's recommendation systems work, here's what's likely going on:

The Pipeline

Instagram's Reels recommendation isn't one model — it's a multi-stage pipeline:

Candidate Generation — pulls a large pool of potential reels from followed accounts, trending content, and category clusters
First-Stage Ranking — a lightweight model scores candidates quickly (Instagram uses a Two Towers neural network here, which can cache embeddings efficiently)
Second-Stage Ranking — a heavier multi-task model (MTML) predicts engagement probability for top candidates
Reranking + Filters — diversity rules, content moderation, eligibility checks
Reels Chaining — selects what plays next to keep the session going within a content cluster

The speed we observed (~2 minutes) suggests the interest profile update is happening in near real-time — what's called online learning, where user interaction signals stream into the system without requiring a full model retrain.

Why Content Gets Classified Without Hashtags

Instagram is consistent with using computer vision and audio analysis on every reel, independent of metadata. The system can identify objects, scenes, on-screen text, audio patterns, and visual style. A gaming reel with zero hashtags still has game UI on screen, specific audio, and recognizable visual patterns — enough to generate a content embedding without any metadata hints at all.

Why She Got the Same Anime Subgenre (Not Just "Anime")

The Two Towers model doesn't classify content into broad buckets — it generates dense embedding vectors that capture fine-grained stylistic and thematic features. Your Lie in April and I Want to Eat Your Pancreas likely sit close together in embedding space because they share visual aesthetics, pacing, color palette, and emotional tone — not just the genre tag "anime."

When I sent her one, the system updated her interest vector toward that specific region of embedding space, and served content from the same neighborhood.

The Asymmetry: Why My Feed Didn't Change

My feed has years of varied engagement history — diverse genres, lots of signals, a well-established taste graph. Shifting it requires overcoming accumulated inertia.

Her feed, by contrast, appears to have a lower signal diversity — not because she's a new or inactive user, but because her engagement pattern is cleaner and less fragmented. Each watch signal she sends is relatively uncontested, so new signals propagate faster.

I'd call this low engagement inertia — a state where the algorithm has a highly responsive, low-noise profile to work with. It's not a flaw in the system. It's the system working exactly as designed, just made visible.

Limitations

This was a single informal observation, not a controlled study. Some important caveats:

Single-user observation (n=1)
No control over watch time, likes, or skips during the session
Background app activity may have influenced results
Feed state prior to the experiment was not fully quantified

These results should be interpreted as exploratory rather than conclusive. The patterns are consistent with known system behavior, but cannot be treated as proof of specific mechanisms.

What This Actually Means

Filter bubbles can shift in minutes, not days. The popular narrative is that algorithmic filter bubbles form slowly over time. What we saw suggests that at least for users with low engagement inertia, a single session can substantially reorient the feed. That has real implications for how quickly someone can get pulled into a content rabbit hole.

The system understands content, not just categories. The concept-level match at 12:18 AM and the no-hashtag gaming reel both point to the same thing: Instagram's content understanding appears to go well beyond keyword matching. Hashtags are a hint, not a requirement.

Intentional feed influence is surprisingly accessible. I shifted her feed across six completely different genres in one night by simply sending her reels. Anyone could do this — a friend, a family member, or theoretically someone with less benign intent. The system has no apparent mechanism to distinguish "organic interest signal" from "someone else sent this to you."

I started this by sending my friend a food reel as a bit. I ended it two hours later having documented six genre shifts, a concept-level meme match, and a hashtag-free classification. She has not forgiven me. Understandably.

In the right conditions, your feed isn't a reflection of your interests — it's a reflection of your last 10 minutes.

I Built a Chrome Extension to Bypass Spotify's Mini-Player Paywall (Because I'm a Pirate 🏴‍☠️)

Arnab Datta — Sun, 29 Mar 2026 12:16:38 +0000

I love listening to music while coding.

Not background noise. Not lo-fi beats. Actual music — I want to see what's playing, change tracks without breaking flow, and keep the player somewhere visible so I can glance at it without switching tabs.

Spotify Web has a mini-player. You probably know this. What you might also know is that it's locked behind Premium. Free tier users get the full-page player or nothing. You can't make it small. You can't tuck it into a corner. Spotify decided that's a Premium feature.

I disagreed.

So I built my own.

The Problem (In Case You've Never Hit This)

Here's the workflow I wanted:

Spotify Web open in one tab, playing music
A tiny floating player sitting in the corner of my screen
Full controls — play, pause, skip, seek, volume — without switching tabs
A Picture-in-Picture window I can drag to another monitor while I code

Spotify's answer: pay for Premium.

My answer: build a Chrome Extension that injects its own mini-player directly onto the page.

Will I submit it to the Chrome Web Store? Probably not — it's kind of piracy and I don't think they'd approve it. But who cares. I'm a pirate. 🏴‍☠️

What I Built

Spotify Float is a Chrome Extension (Manifest V3) that:

Injects a floating, draggable, resizable mini-player into open.spotify.com
Supports three modes: Full (art + info + controls), Compact, and Mini pill
Has a real Document Picture-in-Picture window — a separate always-on-top OS window showing the album art with controls on hover
Works entirely on the free tier — no Spotify API, no login, no data collection

Here's what it looks like:

And How it works:

How It Actually Works

The interesting part: this extension doesn't use the Spotify API at all.

Spotify's Web Player is a React app. All the playback state lives in the DOM — track titles, artist names, play state, progress — all exposed through data-testid attributes and aria-label values. So instead of going through any API, I just... read the DOM directly and simulate clicks.

// Finding the play button — no API needed
var SELECTORS = {
  playPauseButton: [
    '[data-testid="control-button-playpause"]',
    'button[aria-label="Play"]',
    'button[aria-label="Pause"]',
    // ...more fallbacks
  ],
  // ...
};

Every control has a priority-ordered list of selectors. Primary is data-testid (most stable), with CSS class fallbacks for when Spotify updates their DOM.

The Sync Loop

Every 500ms (or 2000ms when paused), syncNow() reads the current state from the DOM and updates the floating player:

MutationObserver (DOM changes on the now-playing widget)
    ↓ debounce 200ms
    ↓
syncNow()
  ├── readText('trackTitle')
  ├── readText('artistName')
  ├── cachedResolve('albumArt')
  ├── calcProgress()  — 3-strategy fallback
  ├── playBtn aria-label → play state
  ├── shuffleBtn aria-label → shuffle state
  └── repeatBtn aria-label → repeat mode

The sync loop completely stops when the mini-player is hidden — no background polling, no CPU waste.

Seeking Without an API

Seeking was the trickiest part. Spotify's progress bar is a range input, but you can't just set .value — React controls it and ignores direct assignment. The trick is using the native property setter:

function handleSeek(pct) {
  var sl = resolveSelector(SELECTORS.seekSlider);
  if (sl) {
    var setter = Object.getOwnPropertyDescriptor(
      window.HTMLInputElement.prototype, 'value'
    ).set;
    setter.call(sl, pct * parseFloat(sl.max || '100'));
    sl.dispatchEvent(new Event('input', { bubbles: true }));
    sl.dispatchEvent(new Event('change', { bubbles: true }));
  }
}

This bypasses React's synthetic event system and fires a real native event that Spotify's player actually listens to.

Shadow DOM — Fully Isolated

The floating player is built inside a Shadow DOM. This means Spotify's CSS can't bleed in and break the player's styles, and the player's styles can't accidentally affect Spotify's page. Complete isolation.

this.host = document.createElement('div');
this.host.id = 'spotify-float-host';
document.documentElement.appendChild(this.host);
this.shadow = this.host.attachShadow({ mode: 'open' });
// Everything inside is fully encapsulated

The PiP Bug That Took Me A While

The Document Picture-in-Picture API is relatively new (window.documentPictureInPicture) and it does something unexpected: CSS :hover doesn't work inside a PiP window.

The PiP window is a separate OS-level window with its own document. When your mouse is inside it, the main page's document doesn't receive hover events. So all the CSS I had like:

#player:hover #ctrl { opacity: 1; }
#player:hover #pw   { opacity: 1; }

...never fired. The controls were stuck invisible and unclickable. Forever.

The fix was to stop relying on CSS hover entirely and switch to JS events registered directly on the PiP window's document:

pipWindow.document.addEventListener('mouseenter', function () {
  if (pipPlayer) pipPlayer.classList.add('pip-hovered');
});
pipWindow.document.addEventListener('mouseleave', function () {
  if (pipPlayer) pipPlayer.classList.remove('pip-hovered');
});

Then all the hover styles reference .pip-hovered instead of :hover. Simple fix once you know the root cause, but CSS :hover silently not working across window contexts is not an obvious thing to debug.

The Stack

No frameworks. No build tools. No npm packages at runtime.

Manifest V3 Chrome Extension
Vanilla JS — single bundled IIFE in content.js
Shadow DOM for style encapsulation
Document Picture-in-Picture API for the PiP window
chrome.storage.local for persisting position, size, mode
Node.js (built-in zlib only) for icon generation

The whole thing is five files that Chrome loads directly. No webpack, no transpilation, no dependencies.

One thing worth noting: Chrome MV3 content scripts cannot use ES module import/export without some manifest workarounds that introduce their own issues. So the entire UI, selector system, and logic are bundled into one self-contained IIFE. If you're building a Chrome Extension and hitting Uncaught SyntaxError: Cannot use import statement outside a module — that's why.

Features at a Glance

What	How
Play / Pause / Skip / Shuffle / Repeat	DOM click simulation with retry backoff
Seek	Native range input setter + bubbling events
Volume	Same native setter technique
Drag	`mousedown` on handle → `mousemove` on document, viewport-clamped
Resize	SE corner handle, 200–500px × 120–700px
PiP	`window.documentPictureInPicture.requestWindow()`
Persistence	`chrome.storage.local` — position, size, mode, visibility
Keyboard	`Space`, `Ctrl+→`, `Ctrl+←`

Get It

The extension is open source on GitHub:

github.com/Arnab500th/Spotify-miniplayer-chrome-extension-By-pass-premium-pay-walls

Download the ZIP from the Releases page, unzip it, go to chrome://extensions, enable Developer Mode, click Load unpacked, select the folder. Done.

What's Next

Better selector recovery when Spotify does a major redesign
Maybe a volume keyboard shortcut
Possibly a lyrics overlay if I can figure out a non-API approach

And yes, I know the repo name is a bit on the nose. But it's accurate. 🏴‍☠️

Not affiliated with Spotify AB. This is a personal project built for fun and learning. Use it responsibly.

How We Built an AI Littering Detection System in 4 Days — and Won 2nd Place

Arnab Datta — Tue, 24 Mar 2026 13:39:15 +0000

We had 4 days, one laptop with an RTX 2050, and a problem nobody on our team had fully solved before. This is the story of building TRACE — and everything that broke along the way.

What Is TRACE?

TRACE (Trash Recognition and Automated Civic Enforcement) is a real-time AI surveillance pipeline that:

Detects littering events across multiple live camera feeds
Confirms offender identity using a 5-state behaviour machine
Reads license plates via OCR for vehicle offenders
Routes WhatsApp alerts with evidence snapshots to the nearest municipality ward office using GPS distance
Streams live annotated video to a real-time analytics dashboard

Stack: Python · YOLOv8 · ByteTrack · EasyOCR · FastAPI · SQLite · Twilio · OpenCV · HTML/CSS/JS

We won 🥈 2nd place at NextGenHack 2026. This was my first ever hackathon, first semester of college, competing against seniors.

Here's what actually happened during those 4 days.

Problem 1: Detecting Behaviour, Not Just Objects

The obvious approach — detect trash, flag it — doesn't work. Trash appears in a frame for a lot of reasons that aren't littering. Someone carrying a bag. A bin. A parked vehicle with litter near it. You'd get false alerts constantly.

We looked at Human Action Recognition (HAR) models first. The idea was to classify the action — "person dropping object" — directly. But every model we tested was either too slow for real-time inference on our hardware, trained on datasets that didn't cover littering specifically, or produced too many false positives on adjacent actions like "person placing object on surface."

No perfect fit existed. So I designed something from scratch.

The 5-State Machine

Every tracked trash object moves through states independently:

UNKNOWN → CARRYING → SEPARATION → STATIONARY → ALERTED
                                ↘ CANCELLED (owner returns)

UNKNOWN: Trash first appears. Looking for the nearest suspect.
CARRYING: Suspect within 150px of the object — assumed being carried.
SEPARATION: Suspect has moved away. Timer starts.
STATIONARY: Object hasn't moved more than 15px in 30+ frames since separation.
ALERTED: Suspect is beyond 200px — confirmed littering. Evidence captured, alert dispatched.
CANCELLED: Owner identified by ByteTrack ID returns — false alarm cleared.

The key detail: owner identity is verified using ByteTrack track IDs, not just position. A different person walking near a stationary object doesn't cancel the alert. Without this, any passerby would reset the timer.

This took hours of whiteboarding. Getting the transition logic right — especially the cancellation paths — was harder than the model training.

Problem 2: Single Camera to Multi-Camera

Getting one camera working was straightforward. Getting three to run simultaneously without everything collapsing was a different problem entirely.

The issues hit in layers:

Threading: Each camera needs its own detection loop. Python's GIL means you can't just run them in threads and expect true parallelism for CPU-bound work. We moved to one thread per camera, each with its own model instances.

Shared state: ByteTrack maintains tracking state across frames. If two cameras share a tracker, their track IDs collide and the state machine breaks completely. Solution: each camera thread gets its own ByteTrack instance. No sharing.

MJPEG streaming: The dashboard needs live video. Naive implementation — encode frame, POST to backend, serve — blocks the detection loop and tanks FPS. We decoupled it: a separate sender thread reads from a shared frame buffer and POSTs independently. The detection loop writes one frame to the buffer (microseconds) and moves on. If the backend is slow, the sender skips to the latest frame. Detection runs at full GPU speed regardless.

Problem 3: Round 2 Surprise — Add Geofencing

Midway through the hackathon, the judges told us to add geofencing. New requirement, mid-build.

The goal: instead of hardcoding a phone number per camera, alerts should automatically route to the nearest municipality ward office based on the camera's GPS coordinates.

My first instinct was Euclidean distance — just subtract the coordinates. That's wrong.

1 degree of latitude ≈ 111 km. Raw degree subtraction treats coordinates as flat 2D points, which gives completely wrong distances at any real-world scale. A camera 200 metres from an office could appear farther than one 2 km away depending on which direction you measure.

The correct formula is Haversine, which accounts for the Earth's curvature:

import math

def haversine(lat1, lng1, lat2, lng2):
    R = 6_371_000  # Earth radius in metres
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lng2 - lng1)
    a = math.sin(dphi/2)**2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlambda/2)**2
    return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

Now nearest_office(cam_lat, cam_lng) iterates through every entry in MUNICIPALITY_OFFICES, computes Haversine distance, and returns the closest one. Adding a new ward office requires one dict entry in config. No camera config changes needed — routing updates automatically.

We also added high sensitivity zones — schools, stations, heritage sites — where cameras within a defined radius never drop below MEDIUM priority surveillance.

Problem 4: The GPU Was Choking

More cameras meant the GPU was hitting its ceiling. On an RTX 2050, we could run 4 cameras at full inference before FPS started dropping hard.

I looked at standard rate-control approaches:

Token bucket: Solves contention between producers sharing one resource. But each camera owns its own thread and model instances — there's no shared queue to arbitrate. Doesn't fit.
Frame differencing: Gates inference on pixel-change detection. Sounds good, but lighting changes, wind, insects — all produce false triggers. Also creates irregular frame gaps that ByteTrack's persist=True wasn't designed for, breaking track continuity.

We'd actually had simple frame skipping in an earlier version — run detection every Nth frame regardless of what's happening. We scrapped it because it broke tracking. ByteTrack needs consistent temporal input to maintain IDs reliably.

The Dynamic Priority System

The insight: most cameras are idle most of the time. A camera pointed at an empty street at 2am doesn't need the same inference rate as one that just detected a littering event.

Each camera thread tracks time since its last confirmed trash detection and assigns itself a priority:

def get_camera_skip(ctx):
    elapsed = time.time() - ctx.last_trash_time
    if elapsed < PRIORITY_HIGH_WINDOW:    # 5 seconds
        return PRIORITY_HIGH_SKIP         # skip=1, every frame
    elif elapsed < PRIORITY_MEDIUM_WINDOW: # 30 seconds
        return PRIORITY_MEDIUM_SKIP       # skip=5
    else:
        return PRIORITY_LOW_SKIP          # skip=8

Key design decisions:

Trash model runs every frame regardless — only person detection is skipped
Cameras start at LOW automatically — last_trash_time=0.0 means elapsed ≈ 1.7 billion seconds
Skipped frames reuse last_known_persons cache — ByteTrack state is preserved between detection frames
Priority transitions POST to backend only on change — not every frame

Result: went from 4 cameras at full load to 6-9 cameras on the same RTX 2050.

The difference from the old frame skipping: this version is activity-aware. It doesn't skip blindly on a fixed schedule — it skips based on what's actually happening in the scene. A camera that just detected a littering event immediately jumps to HIGH (every frame) for 5 seconds. An idle camera at LOW still runs the trash model every frame, just not person detection.

The Dashboard Problem (JS with No JS Experience)

None of our team were JavaScript developers. The dashboard needed to be live, multi-camera, handle MJPEG streams, update charts every 5 seconds, and look presentable to judges.

We deliberately chose plain HTML/CSS/JS — no React, no build step, no npm. Zero risk of build failures mid-demo. It opens directly in any browser and polls the FastAPI backend every 5 seconds.

Chart.js for the graphs. Native <img> tags for MJPEG streams — the browser handles multipart decode natively, no JS required. fetch() for everything else. It works. It held up through the entire demo.

What We Shipped

Multi-camera real-time detection (threaded, one worker per camera)
5-state littering behaviour machine with ByteTrack ID-based owner verification
EasyOCR license plate recognition with Indian format validation
Haversine geofencing — nearest ward office routing
Dynamic HIGH/MEDIUM/LOW priority inference system
imgbb snapshot upload → Twilio WhatsApp alert with zone label
FastAPI backend, SQLite, MJPEG streaming
Live dashboard with priority badges and zone sensitivity indicators

Model performance: YOLOv8s fine-tuned on TACO dataset, mAP50 = 0.81

What I'd Do Differently

The state machine thresholds (150px carry distance, 200px abandon distance) were tuned empirically on test videos. They work, but they're pixel-based — which means they're resolution and camera-angle dependent. A proper implementation would normalize by estimated person height in frame.

The seen_trash_ids set that tracks confirmed events is never pruned. Over a long session it grows indefinitely. Simple fix with a timestamp-based TTL, just didn't make the hackathon cut.

Frame differencing as a complement to the priority system — gating the trash model on truly static scenes — would be the next meaningful optimization. The priority system handles person detection well. The trash model still runs every frame regardless.