<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kiran Baby</title>
    <description>The latest articles on DEV Community by Kiran Baby (@kiranbaby14).</description>
    <link>https://dev.to/kiranbaby14</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F744589%2F12dd11a6-9fe2-4dd2-b2d1-262e7f7e7567.jpeg</url>
      <title>DEV Community: Kiran Baby</title>
      <link>https://dev.to/kiranbaby14</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kiranbaby14"/>
    <language>en</language>
    <item>
      <title>I built a real-time 3D map of London Underground trains</title>
      <dc:creator>Kiran Baby</dc:creator>
      <pubDate>Sat, 04 Apr 2026 15:33:45 +0000</pubDate>
      <link>https://dev.to/kiranbaby14/i-built-a-real-time-3d-map-of-london-underground-trains-4fl2</link>
      <guid>https://dev.to/kiranbaby14/i-built-a-real-time-3d-map-of-london-underground-trains-4fl2</guid>
      <description>&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;TFL (Transport for London) publishes live arrival predictions for every tube train through their Unified API. The data is free and I thought: what if I could take those predictions and actually &lt;em&gt;show&lt;/em&gt; every train moving across London in real time, on a 3D map?&lt;/p&gt;

&lt;p&gt;Turns out you can. But the gap between "TFL gives you arrival times" and "smooth 3D trains gliding along accurate track geometry" is way bigger than I expected. This post is about everything that lives in that gap.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwl1z3riwq5tczgfpor0h.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwl1z3riwq5tczgfpor0h.gif" alt="3D map of London showing live tube trains moving along their routes in real time" width="600" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://minilondon3d.xyz" rel="noopener noreferrer"&gt;minilondon3d.xyz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can tap any train to see its full route and upcoming stops, or tap a station to see all approaching trains with live countdowns. There's also a service status panel showing disruptions across all lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture overview
&lt;/h2&gt;

&lt;p&gt;The system has three main pieces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A Python worker&lt;/strong&gt; that polls the TFL API every 60 seconds, processes the raw data, and writes to Redis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A FastAPI server&lt;/strong&gt; that reads from Redis and pushes updates to frontends over WebSocket&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Next.js frontend&lt;/strong&gt; that receives train data, animates positions along polylines using &lt;code&gt;requestAnimationFrame&lt;/code&gt;, and renders everything with Three.js on top of MapLibre GL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The worker and API server are separate processes. They communicate entirely through Redis: the worker writes cached train data and publishes updates on a pub/sub channel, the API server subscribes and relays to WebSocket clients. This means I can restart either one independently, and the API server stays completely stateless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the foundation: static route data
&lt;/h2&gt;

&lt;p&gt;Before a single train can be placed on the map, the system needs two things: accurate track geometry (the physical shape of each line on a map) and a station coordinate index (where every station is and how stations are ordered along each line). These come from completely different sources and get loaded once at startup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Track geometry: GeoJSON
&lt;/h3&gt;

&lt;p&gt;TFL's own route endpoint only gives you straight lines between stations, which looks terrible on a map. Real tube lines curve, run parallel, split at junctions. I got accurate geometry from &lt;a href="https://github.com/oobrien/vis/blob/master/tubecreature/data/tfl_lines.json" rel="noopener noreferrer"&gt;Oliver O'Brien's GeoJSON file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The problem is it comes as many tiny disconnected line fragments. The server's first job at startup is chaining these into continuous polylines. The algorithm picks a fragment, scans for others whose start or end matches within about 5 meters, flips and chains them together, and repeats until nothing else connects. This reduces something like 47 fragments down to 3-5 continuous polylines per line (roughly one per branch).&lt;/p&gt;

&lt;h3&gt;
  
  
  Station data: TFL's Route/Sequence endpoint
&lt;/h3&gt;

&lt;p&gt;The live arrivals endpoint gives you &lt;code&gt;naptanId&lt;/code&gt;, &lt;code&gt;stationName&lt;/code&gt;, &lt;code&gt;timeToStation&lt;/code&gt;, and other prediction data for each upcoming stop, but &lt;em&gt;no coordinates&lt;/em&gt;. No lat, no lon. If you want to know where King's Cross actually is on a map, you need a separate data source.&lt;/p&gt;

&lt;p&gt;That source is TFL's &lt;code&gt;/Line/{id}/Route/Sequence/{direction}&lt;/code&gt; endpoint, which I fetch for every line in both directions at startup. This endpoint returns two important structures:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;stopPointSequences&lt;/code&gt;&lt;/strong&gt; is where station coordinates and branch topology live. Each entry in this array represents a &lt;em&gt;branch&lt;/em&gt;: a segment of track between junctions. For a simple line like Victoria, there's basically one branch. For the Northern line, there are several.&lt;/p&gt;

&lt;p&gt;Each branch contains an ordered list of stop points with their naptan IDs, names, coordinates, zones, disruption flags, and which other lines serve that station. It also contains &lt;code&gt;nextBranchIds&lt;/code&gt; and &lt;code&gt;prevBranchIds&lt;/code&gt;, which describe how branches connect at junctions. This is essentially a graph of the line's topology.&lt;/p&gt;

&lt;p&gt;I use this branch graph for validation. Before attempting polyline extrapolation between two consecutive stops, I check that both stops appear in the same branch. If they don't, the two stops straddle a junction and polyline math would project the train onto the wrong branch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;orderedLineRoutes&lt;/code&gt;&lt;/strong&gt; gives complete end-to-end route variants. While &lt;code&gt;stopPointSequences&lt;/code&gt; gives you the physical branch segments, &lt;code&gt;orderedLineRoutes&lt;/code&gt; gives you the full journeys that trains actually make. Each variant is a name and an ordered list of naptan IDs covering the full route from origin to terminus.&lt;/p&gt;

&lt;p&gt;These variants are critical for data cleaning. When a train shows up with a list of predicted stops, I match it against all variants by finding the one whose naptan ID list contains the most of the train's stops. This tells me which specific route the train is running, which in turn gives me the correct geographical ordering of stations. That ordering becomes the source of truth for detecting and filtering out stale or inconsistent predictions.&lt;/p&gt;

&lt;p&gt;All of this gets built into an in-memory station coordinate index (keyed by both naptan ID and normalized station name), a set of route variant definitions, and a branch topology graph. The whole thing gets cached in Redis so it survives restarts without re-fetching from TFL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-computed segment table
&lt;/h3&gt;

&lt;p&gt;With both the track geometry and station positions loaded, the server does one more thing at startup: it walks every route variant's station list pairwise, snaps each station onto the polylines, slices the polyline between each consecutive pair of stations, and stores the resulting geometry. Both forward and reverse directions are cached.&lt;/p&gt;

&lt;p&gt;When a user later clicks a train and requests its full route path, the server just concatenates pre-computed segments. No geometry work at request time. This brought the path endpoint from 200-400ms down to essentially a dictionary lookup.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard part: where is each train?
&lt;/h2&gt;

&lt;p&gt;Now we have geometry, station coordinates, route variants, and branch topology all loaded. The worker starts polling TFL's arrivals endpoint every 60 seconds, which returns raw predictions for every active train across all lines.&lt;/p&gt;

&lt;p&gt;But remember: TFL doesn't give you GPS coordinates for trains. What you get is a list of arrival predictions: "Vehicle X will arrive at Y in Z seconds"&lt;/p&gt;

&lt;p&gt;For each prediction, we know the naptan ID of the station, but we look up that station's lat/lon from our own pre-built station coordinate index. TFL never tells us where the &lt;em&gt;train&lt;/em&gt; is. We have to figure that out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Polyline-based extrapolation (the good one)
&lt;/h3&gt;

&lt;p&gt;Take the train's next two upcoming stops. Look up their coordinates from our station index. Snap both onto the line's track polylines. If they both land on the same polyline segment, extrapolate &lt;em&gt;backward&lt;/em&gt; from the first stop. Then interpolation happens on the frontend as smooth animations.&lt;/p&gt;

&lt;p&gt;Before doing this, the system checks the branch graph to confirm both stops are on the same branch. This prevents the math from projecting Northern line trains onto the wrong branch at junctions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Straight-line extrapolation (fallback)
&lt;/h3&gt;

&lt;p&gt;When polyline snapping fails (the two stops land on different polyline segments, or there's a junction crossing between them), I fall back to simple linear extrapolation between the two station coordinates. Less accurate, but it keeps the train roughly where it should be.&lt;/p&gt;

&lt;p&gt;Both approaches require at least two upcoming stops with valid coordinates. In the rare cases where that's not available (a train approaching its terminus with only one prediction left, or a station coordinate lookup failure), the system falls back to parsing TFL's &lt;code&gt;currentLocation&lt;/code&gt; text (strings like &lt;code&gt;"Between King's Cross and Angel"&lt;/code&gt; or &lt;code&gt;"At Finsbury Park"&lt;/code&gt;), and as a last resort, just places the train at its nearest upcoming stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Snapping to tracks (and not the wrong track)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bearing-aware snapping
&lt;/h3&gt;

&lt;p&gt;Here's a problem I didn't anticipate. The Northern line has two branches through central London (Bank and Charing Cross) that run geographically very close together. A naive "snap to nearest polyline" approach would sometimes snap a Bank branch train onto the Charing Cross tracks, because they're only a few hundred meters apart.&lt;/p&gt;

&lt;p&gt;The fix: bearing-aware snapping. When snapping a point, I also pass the estimated direction of travel. The algorithm scores each candidate segment by combining distance &lt;em&gt;and&lt;/em&gt; bearing alignment, with the bearing penalty weighted 3x relative to distance. A segment that's farther away but aligned with the train's direction of travel beats a closer segment that's angled off. Anything more than 60 degrees off the train's bearing gets rejected entirely before scoring.&lt;/p&gt;

&lt;p&gt;This single change fixed most of the "train on wrong branch" bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  TFL data quality issues (there are many)
&lt;/h2&gt;

&lt;p&gt;Working with TFL arrival predictions taught me that real-time transit data is messy in ways I really didn't expect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Duplicate vehicle IDs across lines
&lt;/h3&gt;

&lt;p&gt;TFL reuses numeric vehicle IDs across different lines. Vehicle "240" on Bakerloo and "240" on Piccadilly are completely different physical trains. If you group predictions by just &lt;code&gt;vehicleId&lt;/code&gt;, you get Frankenstein trains with stops on two different lines. I group by &lt;code&gt;(vehicleId, lineId)&lt;/code&gt; and create composite IDs like &lt;code&gt;bakerloo_240&lt;/code&gt; to keep them separate.&lt;/p&gt;

&lt;h3&gt;
  
  
  DLR has no vehicle IDs at all
&lt;/h3&gt;

&lt;p&gt;DLR trains are driverless and TFL doesn't assign them vehicle IDs in the arrivals endpoint. Every single prediction comes with &lt;code&gt;vehicleId: "000"&lt;/code&gt;. To get DLR trains on the map, I synthesize unique IDs by combining the line, destination, direction, and a rank within each station group. It's a hack, but it works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stale predictions and mixed snapshots
&lt;/h3&gt;

&lt;p&gt;This was the nastiest data quality issue and it took me the longest to figure out.&lt;/p&gt;

&lt;p&gt;You get situations where a train has already passed a station, but the api still reports a low timeToStation prediction for it. Or the same train has predictions from two different moments in time, so the time-ordered stop list disagrees with the actual geographical order of the stations.&lt;/p&gt;

&lt;p&gt;If you just sort by &lt;code&gt;timeToStation&lt;/code&gt; and trust it, you get trains that appear to jump backward or zigzag.&lt;/p&gt;

&lt;p&gt;The fix uses the route variant's geographical station ordering as the source of truth (this is why resolving the variant first matters so much). First, I drop any stops that are geographically behind the train's anchor position along the variant. Then I sort the remaining stops by their position along the route, not by time. Then I walk backward through the list and drop any stop whose time exceeds a later stop's time (a reverse monotonic filter). Finally, there's a plausibility check: if the first stop is many stations away from the second but only a few seconds apart in time, the first one is stale and gets dropped. A tube train needs at least good amount of time per station, so "5 stations in 20 seconds" is obviously wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Duplicate platform predictions
&lt;/h3&gt;

&lt;p&gt;TFL returns one prediction per &lt;em&gt;platform&lt;/em&gt;, not per station. At a shared termini you can get 5+ predictions for a single stop because TFL broadcasts to all possible platforms before the platform is assigned. I deduplicate by &lt;code&gt;naptanId&lt;/code&gt; to keep one entry per station.&lt;/p&gt;

&lt;h3&gt;
  
  
  Station name inconsistencies
&lt;/h3&gt;

&lt;p&gt;TFL spells the same station differently across endpoints and lines. I built a normalizer to handle this, mainly so the station index doesn't end up with duplicate entries and so display names stay consistent across the UI. It also serves as a fallback for the rare cases where a naptan ID lookup fails and the system has to match by station name instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The frontend: animating 400+ trains at 60fps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Client-side polyline animation
&lt;/h3&gt;

&lt;p&gt;The backend sends updated positions every 60 seconds. If I rendered those directly, trains would teleport every minute. Instead, the frontend builds an animation chain for each train.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three.js instanced rendering
&lt;/h3&gt;

&lt;p&gt;Every train is rendered as a small 3D shape (a simplified body with a pointed roof) using Three.js. TThree.js InstancedMesh added to MapLibre as a custom layer, sharing the same WebGL context as the map. Since MapLibre doesn't know the trains exist, clicking on them requires manual raycasting against the instanced mesh.&lt;/p&gt;

&lt;h2&gt;
  
  
  WebSocket with REST fallback
&lt;/h2&gt;

&lt;p&gt;The frontend opens a WebSocket connection for real-time pushes. If the connection drops, it automatically falls back to REST polling every 30 seconds while attempting to reconnect with exponential backoff (starting at 1 second, capping at 30 seconds).&lt;/p&gt;

&lt;p&gt;On the backend side, the worker never talks to WebSocket clients directly. It publishes updates to a Redis pub/sub channel. The API server subscribes to that channel and relays messages to all connected WebSocket clients, grouped by subscription rooms.&lt;/p&gt;

&lt;p&gt;If you've read this far, thanks ❤️ I'd love to know your feedback!&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>threejs</category>
      <category>showdev</category>
    </item>
    <item>
      <title>🌍 I Built MapMeet: A 3D Globe Event Platform for the Mux + DEV Challenge</title>
      <dc:creator>Kiran Baby</dc:creator>
      <pubDate>Wed, 31 Dec 2025 19:13:04 +0000</pubDate>
      <link>https://dev.to/kiranbaby14/i-built-mapmeet-a-3d-globe-event-platform-for-the-mux-dev-challenge-5ai7</link>
      <guid>https://dev.to/kiranbaby14/i-built-mapmeet-a-3d-globe-event-platform-for-the-mux-dev-challenge-5ai7</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/mux-2025-12-03"&gt;DEV's Worldwide Show and Tell Challenge Presented by Mux&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🦈 Alright Sharks... I Mean, Judges!
&lt;/h2&gt;

&lt;p&gt;I'll be honest with you. I've binge-watched way too many episodes of Shark Tank. The drama, the pitches, the "I'm out" moments... I'm completely hooked.&lt;/p&gt;

&lt;p&gt;So when I saw this challenge was literally described as &lt;strong&gt;"Shark Tank but without the sharks"&lt;/strong&gt; I knew this was my moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Today I'm here to pitch you &lt;strong&gt;MapMeet&lt;/strong&gt;, a global event discovery platform that lets anyone create, discover, and join events visualized on a stunning &lt;strong&gt;interactive 3D globe&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But here's the twist that makes MapMeet different from every other event platform out there:&lt;/p&gt;

&lt;p&gt;🌐 &lt;strong&gt;Real-time geographic arcs&lt;/strong&gt; connect attendees to events on the globe. When someone RSVPs and shares their location, a beautiful animated arc draws from their location to the event showing the &lt;em&gt;global reach&lt;/em&gt; of your event in the most visually stunning way possible.&lt;/p&gt;

&lt;p&gt;Imagine hosting a hackathon and watching arcs light up from Tokyo, India, Lagos, Berlin, and San Francisco all converging on your event marker. &lt;em&gt;That's&lt;/em&gt; the MapMeet experience.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Psst... I created a live event for this hackathon so you can see it in action yourself. Link below! 👀)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Pitch Video
&lt;/h2&gt;

&lt;p&gt;

&lt;iframe src="https://player.mux.com/PoySx1xvSXhMei3qc02BKKW6BVmooortd00dt1YMpt4Lg" width="710" height="399"&gt;
&lt;/iframe&gt;



&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🌍 &lt;strong&gt;Live App:&lt;/strong&gt; &lt;a href="https://www.mapmeet.co" rel="noopener noreferrer"&gt;https://www.mapmeet.co&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🎉 JOIN THE MAPMEET EVENT I CREATED FOR THIS HACKATHON!
&lt;/h3&gt;

&lt;p&gt;I've created a special event on MapMeet to celebrate this Mux + DEV challenge. Join it to show your support and see the platform in action! I'm on Premium so &lt;strong&gt;unlimited people can join&lt;/strong&gt; - let's see how global we can make this! 🌍&lt;/p&gt;

&lt;p&gt;I did some digging and set the event location at &lt;strong&gt;Mux HQ in San Francisco&lt;/strong&gt; so all our arcs will converge right on their doorstep 😄 Also made a custom Mux + DEV cover image for it because why not go all in?&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://www.mapmeet.co/event/62e4buqr" rel="noopener noreferrer"&gt;JOIN: MapMeet Launch Party - Mux + DEV Hackathon 🌍&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No account needed to view, just click and explore! If you RSVP, you'll become one of those beautiful arcs on the globe. Let's light it up together! 🌈&lt;/p&gt;

&lt;h2&gt;
  
  
  How MapMeet Works - Complete Overview
&lt;/h2&gt;

&lt;p&gt;

&lt;iframe src="https://player.mux.com/FfGH0201WK8LlcSO9i00Aupn4NBqxQ35zXBdB902uPIfTn8" width="710" height="399"&gt;
&lt;/iframe&gt;



&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story Behind It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem I Saw
&lt;/h3&gt;

&lt;p&gt;Every event platform feels &lt;em&gt;flat&lt;/em&gt;. You create an event, share a link, and hope people show up. There's no visual excitement, no sense of global community, no "wow factor" that makes people &lt;em&gt;want&lt;/em&gt; to share your event.&lt;/p&gt;

&lt;p&gt;I asked myself: &lt;strong&gt;What if attending an event felt like being part of something global?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The MapMeet Vision
&lt;/h3&gt;

&lt;p&gt;MapMeet transforms event hosting into a visual experience. Concert organizers can show fans flying in from around the world. Hackathon hosts can visualize their global developer community. Marathon coordinators can display runners coming from every continent. Conference speakers can see their audience's geographic spread. Community meetups can prove their worldwide reach to sponsors.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Shareability Secret
&lt;/h3&gt;

&lt;p&gt;Here's something I'm really proud of: &lt;strong&gt;Event pages don't require login to view.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is huge. When you share your MapMeet event link on WhatsApp, Instagram, Twitter, or LinkedIn, anyone can see your stunning 3D globe visualization, view attendee arcs from around the world, read all event details, and get hyped about joining.&lt;/p&gt;

&lt;p&gt;No friction. No "sign up to see more" walls. Just pure, shareable, eye-catching event pages that make people stop scrolling and say &lt;em&gt;"Wait, what is THIS?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This means your event promotion just got a serious upgrade. Instead of sharing a boring event link, you're sharing an interactive 3D experience. That's the kind of link people actually click.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building in Public: The Real Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Timeline
&lt;/h3&gt;

&lt;p&gt;I started building MapMeet around December 9th. I had the vision clear in my head, a 3D globe, real-time connections, the whole thing.&lt;/p&gt;

&lt;p&gt;But somewhere around week two, I hit a wall. You know that feeling when you're deep in code, nothing's working the way you want, and suddenly every other project idea seems more exciting? Yeah. I started drifting to other side projects, telling myself I'd come back to MapMeet "later."&lt;/p&gt;

&lt;p&gt;Then I saw this hackathon announcement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shark Tank-style pitches? Video submissions? $3,000 in prizes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That was the kick I needed. Having a deadline and a reason to ship changed everything. I went from "maybe I'll finish this someday" to "this is going live, and I'm pitching it to the world."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you, Mux and DEV, for the accountability.&lt;/em&gt; 🙏&lt;/p&gt;

&lt;h3&gt;
  
  
  First-Time Integrations
&lt;/h3&gt;

&lt;p&gt;This project pushed me into territory I'd never explored before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔐 Supabase (First Time)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'd heard about Supabase but never actually built with it. MapMeet uses Supabase Auth for Google OAuth and email/password authentication, Supabase Realtime for broadcasting live arc updates, and Supabase Storage for event cover images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💳 Stripe (First Time)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'd never implemented payments before. The idea of handling real money in my code was honestly intimidating.&lt;/p&gt;

&lt;p&gt;But Stripe's documentation is incredible. I set up checkout sessions for upgrading to Premium, and webhooks for syncing subscription status.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; The integrations you're scared of are usually the ones with the best documentation. Just start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Highlights
&lt;/h2&gt;

&lt;p&gt;While MapMeet isn't open-source (yet 👀), here's the architecture powering the platform:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js, Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FastAPI, SQLModel ORM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL (on Supabase)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supabase Auth (Google OAuth + Email/Password)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Realtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supabase Realtime (broadcast channels)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supabase Storage (event cover images)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Payments&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stripe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3D Globe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mapbox&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Domain &amp;amp; Hosting Setup
&lt;/h3&gt;

&lt;p&gt;Quick story: I snagged &lt;strong&gt;mapmeet.co&lt;/strong&gt; from GoDaddy because their pricing was great AND it included custom email addresses for the first year.&lt;/p&gt;

&lt;p&gt;Frontend is hosted on &lt;strong&gt;Vercel&lt;/strong&gt;. I just pointed my nameservers from GoDaddy to Vercel, and we're live with edge-fast global performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Business Model
&lt;/h3&gt;

&lt;p&gt;MapMeet runs on a freemium model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Free&lt;/th&gt;
&lt;th&gt;Premium&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Active Events&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Attendees per Event&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time Arcs&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Marker Colors&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Price&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$19/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The free tier is genuinely useful for small meetups and testing the platform. Premium unlocks MapMeet for serious event organizers who need scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use of Mux
&lt;/h2&gt;

&lt;p&gt;Let's talk about &lt;strong&gt;Mux&lt;/strong&gt; because this was a genuine discovery for me.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instant Thumbnails via URL
&lt;/h3&gt;

&lt;p&gt;Need a screenshot from your video? With YouTube, you'd have to manually screenshot and upload it.&lt;/p&gt;

&lt;p&gt;With Mux? Just construct a URL, Just a URL....&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next for MapMeet?
&lt;/h2&gt;

&lt;p&gt;This hackathon was the push to ship v1, but I'm just getting started:&lt;/p&gt;

&lt;p&gt;🎥 &lt;strong&gt;Video integration&lt;/strong&gt; (now that I've discovered Mux!)&lt;br&gt;
🌐 &lt;strong&gt;Event categories&lt;/strong&gt; for better discovery&lt;br&gt;
📊 &lt;strong&gt;Analytics dashboard&lt;/strong&gt; for organizers&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;If you've made it this far, thank you. Seriously. It means the world.&lt;/p&gt;

&lt;p&gt;Here's how you can support MapMeet:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. 🌍 Join the Hackathon Event!
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.mapmeet.co/event/62e4buqr" rel="noopener noreferrer"&gt;JOIN: MapMeet Launch Party - Mux + DEV.to Hackathon 🌍&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Be one of the arcs on the globe! Let's make this the most globally distributed hackathon celebration ever.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 💬 Tell Me What You Think
&lt;/h3&gt;

&lt;p&gt;Drop a comment below. I read and respond to every single one.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. ❤️ React If This Resonated
&lt;/h3&gt;

&lt;h3&gt;
  
  
  4. 🔗 Share With Event Organizers
&lt;/h3&gt;

&lt;p&gt;Know someone who hosts meetups, conferences, or hackathons? Share MapMeet with them!&lt;/p&gt;

&lt;h2&gt;
  
  
  One Last Thing
&lt;/h2&gt;

&lt;p&gt;Building MapMeet taught me that the scariest part of any project is showing it to the world. It's easy to keep tweaking forever, telling yourself "it's not ready yet."&lt;/p&gt;

&lt;p&gt;This hackathon gave me a deadline and a stage. I'm grateful for that push.&lt;/p&gt;

&lt;p&gt;To everyone building something and waiting for the "right moment" to share it: &lt;strong&gt;this is your sign.&lt;/strong&gt; Ship it. Pitch it. Let the world see what you've made.&lt;/p&gt;

&lt;p&gt;The globe is waiting for your arcs. 🌍✨&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Try MapMeet:&lt;/strong&gt; &lt;a href="https://www.mapmeet.co" rel="noopener noreferrer"&gt;https://www.mapmeet.co&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Join the Event:&lt;/strong&gt; [&lt;a href="https://www.mapmeet.co/event/62e4buqr" rel="noopener noreferrer"&gt;https://www.mapmeet.co/event/62e4buqr&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What would YOU host on a 3D globe?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A 24-hour coding marathon across time zones? A worldwide marathon watch party? A concert with fans lighting up from every continent?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drop yours below!&lt;/strong&gt; 👇&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>muxchallenge</category>
      <category>showandtell</category>
      <category>video</category>
    </item>
    <item>
      <title>Video Libraries Made Searchable by AI</title>
      <dc:creator>Kiran Baby</dc:creator>
      <pubDate>Fri, 26 Dec 2025 12:35:13 +0000</pubDate>
      <link>https://dev.to/kiranbaby14/i-built-a-video-search-engine-that-understands-what-youre-looking-for-51m7</link>
      <guid>https://dev.to/kiranbaby14/i-built-a-video-search-engine-that-understands-what-youre-looking-for-51m7</guid>
      <description>&lt;p&gt;&lt;strong&gt;Ever tried finding that ONE moment in a 2-hour video? Yeah, me too. It sucks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Back again with another project! Hope y'all had an amazing Christmas! 🎄. Jingle bells jingle bells jingle all the way&lt;/em&gt; ✌️&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;You recorded a meeting. Or a lecture. Or your kid's recital. Now you need to find that specific part where someone said something important, or that exact scene you vaguely remember.&lt;/p&gt;

&lt;p&gt;Your options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scrub through the entire video like a caveman&lt;/li&gt;
&lt;li&gt;Hope YouTube's auto-chapters got it right (they didn't)&lt;/li&gt;
&lt;li&gt;Give up and rewatch the whole thing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What if you could just... describe what you're looking for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Find the part where he talks about the budget"&lt;/p&gt;

&lt;p&gt;"Show me when there's a red car on screen"&lt;/p&gt;

&lt;p&gt;"Jump to where she mentions the deadline"&lt;/p&gt;

&lt;p&gt;That's what I built.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing SearchLightAI 🔦
&lt;/h2&gt;

&lt;p&gt;SearchLightAI lets you search your videos by describing what you see OR what was said. Upload a video, wait for it to process, then search with natural language.&lt;/p&gt;

&lt;p&gt;It returns the exact timestamp. Click it. You're there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search your videos like you search your documents.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tech Stack 🤓
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FastAPI + SQLModel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Databases&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL (metadata) + Qdrant (vectors)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vision AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SigLIP2 (google/siglip2-base-patch16-512)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speech AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;faster-whisper + Sentence Transformers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Video Processing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FFmpeg + PySceneDetect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js 16, React 19, Tailwind CSS, shadcn/ui&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;📥 Ingestion Pipeline&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Video Upload
    ↓
PySceneDetect → finds scene changes
    ↓
FFmpeg → extracts keyframes + audio
    ↓
faster-whisper → transcribes speech
    ↓
SigLIP2 → embeds keyframes (768-dim)
Sentence Transformers → embeds transcript (384-dim)
    ↓
Qdrant → stores all vectors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🔍 Search Pipeline&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your query: "when he talks about the budget"
    ↓
Same models embed your query
    ↓
Cosine similarity search in Qdrant
    ↓
Results ranked by relevance
    ↓
Click → jump to exact timestamp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Three Search Modes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;🎬 Visual Search&lt;/strong&gt; - Describe what you see&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"man standing near whiteboard"&lt;/li&gt;
&lt;li&gt;"outdoor scene with trees"&lt;/li&gt;
&lt;li&gt;"someone holding a laptop"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🎤 Speech Search&lt;/strong&gt; - What was said&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"when they mentioned the quarterly results"&lt;/li&gt;
&lt;li&gt;"the part about machine learning"&lt;/li&gt;
&lt;li&gt;"discussion about the timeline"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔀 Hybrid Search&lt;/strong&gt; - Best of both&lt;br&gt;
Combines visual and speech results. Usually what you want.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Secret Sauce: SigLIP2
&lt;/h2&gt;

&lt;p&gt;Most visual search uses CLIP. I went with SigLIP2 instead.&lt;/p&gt;

&lt;p&gt;Why? SigLIP uses sigmoid loss instead of softmax contrastive loss. The practical difference: better zero-shot performance, especially for fine-grained visual details.&lt;/p&gt;

&lt;p&gt;One quirk though - raw SigLIP scores are lower than you'd expect. A "great match" might be 0.25-0.35 cosine similarity. So I rescale them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rescale_siglip_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cosine_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Maps SigLIP scores to intuitive 0-1 range.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;midpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.18&lt;/span&gt;
    &lt;span class="n"&gt;steepness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cosine_score&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;midpoint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;steepness&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now 0.35 → ~90%, 0.25 → ~70%, which feels right in the UI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Smart Keyframe Extraction
&lt;/h2&gt;

&lt;p&gt;I'm not extracting every frame (that would be insane). PySceneDetect uses adaptive content detection to find actual scene changes.&lt;/p&gt;

&lt;p&gt;For each scene, I grab:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frame at the start&lt;/li&gt;
&lt;li&gt;Frame at the middle (for scenes &amp;gt; 2 seconds)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives good coverage without exploding storage or processing time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It Yourself
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Docker Compose (Recommended)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kiranbaby14/searchlightai.git
&lt;span class="nb"&gt;cd &lt;/span&gt;searchlightai

&lt;span class="nb"&gt;cp &lt;/span&gt;apps/server/.env.example apps/server/.env
&lt;span class="nb"&gt;cp &lt;/span&gt;apps/client/.env.example apps/client/.env

docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for models to load (around 2-3 min first time), then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend: &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;API: &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;NVIDIA GPU with CUDA support&lt;/li&gt;
&lt;li&gt;Docker + Docker Compose&lt;/li&gt;
&lt;li&gt;Around 4GB+ VRAM should work (SigLIP2 + faster-whisper + Sentence Transformers are relatively lightweight)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⏱️ &lt;strong&gt;Heads up:&lt;/strong&gt; Processing time depends on video length. A 10-min video takes a couple minutes, but longer videos (1hr+) will need more patience. Scene detection, transcription, and embedding generation all add up.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Could You Build With This?
&lt;/h2&gt;

&lt;p&gt;Some ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📹 &lt;strong&gt;Meeting search&lt;/strong&gt; - Find decisions across hundreds of recorded meetings&lt;/li&gt;
&lt;li&gt;🎓 &lt;strong&gt;Lecture navigation&lt;/strong&gt; - Students jumping to specific topics&lt;/li&gt;
&lt;li&gt;📺 &lt;strong&gt;Media asset management&lt;/strong&gt; - Search through footage libraries&lt;/li&gt;
&lt;li&gt;📱 &lt;strong&gt;Personal video search&lt;/strong&gt; - Your phone videos, finally searchable&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Code Is Yours
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/kiranbaby14/searchlightai" rel="noopener noreferrer"&gt;github.com/kiranbaby14/SearchLightAI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Star it ⭐ if you think video search should be this easy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Shoutouts 🙏
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SigLIP2&lt;/strong&gt; from Google for visual embeddings that actually work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PySceneDetect&lt;/strong&gt; for making scene detection actually usable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt; for a vector DB that just works&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;faster-whisper&lt;/strong&gt; for Whisper that's actually fast&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  That's It. Go Break It.
&lt;/h2&gt;

&lt;p&gt;Clone it, throw your weirdest videos at it, see what breaks. File issues. Send PRs. Roast my code in the comments.&lt;/p&gt;

&lt;p&gt;The best part of putting stuff out there? Finding out all the ways you didn't think of using it.&lt;/p&gt;

&lt;p&gt;Catch you in the next one. ✌️&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with ⚡, mass Claude Code sessions, and an unhealthy amount of caffeine ☕ by &lt;a href="https://github.com/kiranbaby14" rel="noopener noreferrer"&gt;@kiranbaby14&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a 3D AI Avatar That Actually Sees and Talks Back 🎭</title>
      <dc:creator>Kiran Baby</dc:creator>
      <pubDate>Fri, 26 Dec 2025 11:11:25 +0000</pubDate>
      <link>https://dev.to/kiranbaby14/i-built-a-3d-ai-avatar-that-actually-sees-and-talks-back-4j1a</link>
      <guid>https://dev.to/kiranbaby14/i-built-a-3d-ai-avatar-that-actually-sees-and-talks-back-4j1a</guid>
      <description>&lt;p&gt;&lt;strong&gt;Chatbots are so 2020. Let me show you what I built instead.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It's been ages since I last posted here. Hope y'all had a great Christmas! 🎄 Feels good to be back.&lt;/em&gt; ✌️&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Every AI Assistant Right Now
&lt;/h2&gt;

&lt;p&gt;You know what's annoying? Typing. &lt;/p&gt;

&lt;p&gt;Every AI tool out there wants you to &lt;em&gt;type type type&lt;/em&gt; like it's 1995. And don't even get me started on the ones that "listen" but can't see what you're showing them.&lt;/p&gt;

&lt;p&gt;So I asked myself: &lt;strong&gt;What if I built an AI that works like an actual conversation?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;👀 &lt;strong&gt;Sees&lt;/strong&gt; what you show it (camera feed)&lt;/li&gt;
&lt;li&gt;👂 &lt;strong&gt;Hears&lt;/strong&gt; you naturally (no push-to-talk nonsense)&lt;/li&gt;
&lt;li&gt;🗣️ &lt;strong&gt;Responds&lt;/strong&gt; with voice and perfectly synced lip movements&lt;/li&gt;
&lt;li&gt;🎭 &lt;strong&gt;Expresses emotions&lt;/strong&gt; through a 3D avatar&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And runs &lt;strong&gt;100% locally&lt;/strong&gt; on your machine. No API keys bleeding your wallet dry.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing TalkMateAI 🚀
&lt;/h2&gt;

&lt;p&gt;TalkMateAI is a real-time, multimodal AI companion. You talk to it, show it things through your camera, and it responds with natural speech while a 3D avatar lip-syncs perfectly to every word.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's like having a conversation with a character from a video game, except it's actually intelligent.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tech Stack (For My Fellow Nerds 🤓)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Backend (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FastAPI + WebSockets → Real-time bidirectional communication
PyTorch + Flash Attention 2 → GPU go brrrrr
OpenAI Whisper (tiny) → Speech recognition
SmolVLM2-256M-Video-Instruct → Vision-language understanding
Kokoro TTS → Natural voice synthesis with word-level timing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Frontend (TypeScript)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Next.js 15 → Because Turbopack is fast af
Tailwind CSS + shadcn/ui → Pretty buttons
TalkingHead.js → 3D avatar with lip-sync magic
Web Audio API + AudioWorklet → Low-latency audio processing
Native WebSocket → None of that socket.io bloat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How It Actually Works
&lt;/h2&gt;

&lt;p&gt;Here's the flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You speak → 
  VAD detects speech → 
    Audio (+ camera frame if enabled) sent via WebSocket → 
      Whisper transcribes → 
        SmolVLM2 understands text + image together → 
          Generates response → 
            Kokoro synthesizes speech with timing data → 
              Audio + lip-sync data sent back → 
                3D avatar speaks with perfect sync
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of this happens in &lt;strong&gt;real-time&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Secret Sauce: Native Word Timing 🎯
&lt;/h2&gt;

&lt;p&gt;Most TTS solutions give you audio and that's it. You're left guessing when each word starts for lip-sync.&lt;/p&gt;

&lt;p&gt;Kokoro TTS gives you &lt;strong&gt;word-level timing data&lt;/strong&gt; out of the box:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;speakData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;audioBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;world&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;wtimes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;      &lt;span class="c1"&gt;// when each word starts&lt;/span&gt;
  &lt;span class="na"&gt;wdurations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;// how long each word lasts&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// TalkingHead uses this for pixel-perfect lip sync&lt;/span&gt;
&lt;span class="nx"&gt;headRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;speakAudio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;speakData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result? Lips that move &lt;em&gt;exactly&lt;/em&gt; when they should. No uncanny valley weirdness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Voice Activity Detection That Actually Works
&lt;/h2&gt;

&lt;p&gt;I didn't want push-to-talk. I wanted natural conversation flow.&lt;/p&gt;

&lt;p&gt;So I built a custom VAD using the Web Audio API's AudioWorklet. It calculates energy levels in real-time and tracks speech frames vs silence frames - all from the frontend (so no unnecessary wastage of backend processing power).&lt;/p&gt;

&lt;p&gt;You just... talk. When you pause naturally, it processes. When you keep talking, it waits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It respects conversational flow.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Heads up:&lt;/strong&gt; This version doesn't support barge-in (interrupting the avatar mid-speech) or sophisticated turn-taking detection. It's purely pause-based - you talk, pause, it responds.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Vision Component 👁️
&lt;/h2&gt;

&lt;p&gt;Here's where it gets spicy. The camera isn't just for show.&lt;/p&gt;

&lt;p&gt;When enabled, every audio segment gets sent &lt;em&gt;with&lt;/em&gt; a camera snapshot. SmolVLM2 processes both together - the audio transcription AND what it sees.&lt;/p&gt;

&lt;p&gt;You can literally say &lt;em&gt;"What am I holding?"&lt;/em&gt; and it'll tell you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It Yourself
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;li&gt;Python 3.10&lt;/li&gt;
&lt;li&gt;NVIDIA GPU with ~4GB+ VRAM should work (I used RTX 3070 8GB, but the models are lightweight - Whisper tiny + SmolVLM2-256M + Kokoro TTS)&lt;/li&gt;
&lt;li&gt;PNPM &amp;amp; UV package managers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone it&lt;/span&gt;
git clone https://github.com/kiranbaby14/TalkMateAI.git
&lt;span class="nb"&gt;cd &lt;/span&gt;TalkMateAI

&lt;span class="c"&gt;# Install everything&lt;/span&gt;
pnpm run monorepo-setup

&lt;span class="c"&gt;# Run both frontend and backend&lt;/span&gt;
pnpm dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Frontend: &lt;code&gt;http://localhost:3000&lt;/code&gt;&lt;br&gt;
Backend: &lt;code&gt;http://localhost:8000&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Can You Build With This?
&lt;/h2&gt;

&lt;p&gt;This is open source. Fork it. Break it. Make it weird.&lt;/p&gt;

&lt;p&gt;Some ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📚 &lt;strong&gt;Language tutors&lt;/strong&gt; that watch your pronunciation&lt;/li&gt;
&lt;li&gt;🎨 &lt;strong&gt;Creative companions&lt;/strong&gt; that see your art and give feedback&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Screen assistants&lt;/strong&gt; - combine with &lt;a href="https://github.com/mediar-ai/screenpipe" rel="noopener noreferrer"&gt;Screenpipe&lt;/a&gt; for an AI that knows what you've been doing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Code Is Yours
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/kiranbaby14/TalkMateAI" rel="noopener noreferrer"&gt;github.com/kiranbaby14/TalkMateAI&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🛠️ &lt;strong&gt;Fair warning:&lt;/strong&gt; This was a curiosity-driven project, not a polished product. There are rough edges, things I'd do differently now, and probably bugs I haven't found yet. But that's the fun of open source, right? Dig in, break stuff, make it better.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Star it ⭐ if you think chatbots should evolve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Shoutouts 🙏
&lt;/h2&gt;

&lt;p&gt;Big thanks to &lt;a href="https://github.com/met4citizen" rel="noopener noreferrer"&gt;met4citizen&lt;/a&gt; for the incredible &lt;a href="https://github.com/met4citizen/TalkingHead" rel="noopener noreferrer"&gt;TalkingHead&lt;/a&gt; library. The 3D avatar rendering and lip-sync magic? That's all their work. I just plugged it in and fed it audio + timing data. Absolute legend.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Would You Build?
&lt;/h2&gt;

&lt;p&gt;Seriously, drop a comment. I want to know what wild ideas you have for real-time multimodal AI.&lt;/p&gt;

&lt;p&gt;AI that sees + hears + responds naturally? That's not the future anymore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's right now. And you can run it on your GPU.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with ❤️ and probably too much caffeine by &lt;a href="https://github.com/kiranbaby14" rel="noopener noreferrer"&gt;@kiranbaby14&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>opensource</category>
      <category>learning</category>
    </item>
    <item>
      <title>My First Blog and My First Game</title>
      <dc:creator>Kiran Baby</dc:creator>
      <pubDate>Thu, 27 Jan 2022 07:38:42 +0000</pubDate>
      <link>https://dev.to/kiranbaby14/my-first-blog-and-my-first-game-33dd</link>
      <guid>https://dev.to/kiranbaby14/my-first-blog-and-my-first-game-33dd</guid>
      <description>&lt;p&gt;Hey guys, so I am new to the DEV community and I am really excited to share my first blog about the first game that I created. The game was named as &lt;strong&gt;"Spheron-The ball game"&lt;/strong&gt; because the protagonist of the game was obviously a 'sphere' and I don't know from where the 'spheron' name popped up in my head. But anyway, I created this game a long while ago back in 2020 while I was doing my undergrad, and I managed to complete the game and upload it to the PlayStore once the colleges were closed due to the pandemic. I guess I am thankful for that which I shouldn't be, but hey, I got a lot of free time to develop the game. The game was made using unity engine and C# as its prgramming language. As I was a beginner into game dev I looked into and learned from a lot of youtube tutorials on how to build a game using unity. Brackey's youtube channel helped me a lot, I am sure Unity devs would've at least heard of this channel once in their lifetime. I know that the game is not an extraordinary or over-the-top one but it was my first game so it holds a special place in my heart. The genre of the game is an endless runner type and you could also collect coins along the way. I would link the game at the bottom of the post so you guys can check it out if you're interested.&lt;/p&gt;

&lt;h4&gt;
  
  
  Controls
&lt;/h4&gt;

&lt;p&gt;The controls are fairly simple&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Touch the right side of the screen to move right&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Touch the left side of the screen to move left&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The objective of the game is to get the protagonist ie; the sphere to dodge all the obstacle that comes along the way without falling down from the platform, and increase the score to the maximum that you can. I've also created a coin system in the game so that the player can collect coins along the way which can later be used to buy different skins for the character, and also if the player dies midway it can be used for resurrection.&lt;/p&gt;

&lt;p&gt;I've also incorporated ads into the app. And these are only reward ads so you don't have to worry about ads popping up here and there and annoying you every time. The ad is optional to the person playing the game. Once the player dies, a popup menu will come that has an ad button to resurrect the player and continue playing. So the ad is completely optional. I used google AdMob for the implementation of the ads. At first, I messed up with the ads. When I uploaded my game to the PlayStore I clicked on the ads many times myself on my own phone and google as the all-seeing eye came to know of it and blocked my AdMob account. But later it got resolved.&lt;/p&gt;

&lt;p&gt;So this was my first blog. I know it took me 2 years to write a blog about the first game that I made, but hey, I wrote it at the end. And I hope to keep writing blogs on this wonderful platform. The next blog will likely be about the second game that I made and once it is done I will update the link to the blog here. So hope u guys enjoyed reading my blog and if you'd like to check out my game and give me feedback on it the link's down below.&lt;/p&gt;

&lt;h4&gt;
  
  
  PlayStore Link
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://play.google.com/store/apps/details?id=com.Jbk.Spheron" rel="noopener noreferrer"&gt;https://play.google.com/store/apps/details?id=com.Jbk.Spheron&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Screenshots
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmt1owbp62j2ekggl2zeo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmt1owbp62j2ekggl2zeo.png" alt=" " width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2h5buogjxgo1ghqkafgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2h5buogjxgo1ghqkafgg.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnyxj4icijjcnm8klwuy8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnyxj4icijjcnm8klwuy8.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gamedev</category>
      <category>unity3d</category>
      <category>beginners</category>
      <category>android</category>
    </item>
  </channel>
</rss>
