nikhil XD

Posted on Mar 15

LinguaCam live: AI-Translated Global Captions, Unified Dynamic Chat, and Interactive Stream Widgets.

#ai #showdev #sideprojects #webdev

LinguaCam live:Professional OBS Overlay Suite: AI-Translated Global Captions, Unified Dynamic Chat, and Interactive Stream Widgets.

LinguaCam Live

Experience floating wave danmu and live translated captions in your webcam streams

lingua-cam-live.vercel.app

From just being an overlay with only two features.

We want to build a platform that utilized bullet chat and live captions to create a streamlined overlay for OBS. The original vision was a simple theme setup for streamers but as development progressed the project evolved into a real-time interaction dashboard. I am obsessed with danmu chat because in most streams if you dont speak the streamer's language or cant keep up with the fast vertical chat, you are basically invisible. I wanted to build a platform where the audience's voice becomes a first class part of the video feed.

The vision was clear, yet two primary hurdles stood in my ways…:

The Latency Trap: Captions and bullet chats lose all value if they suffer from delays.

The Clutter Issue: Managing hundreds of messages flowing from right to left without slaughtering and processing the content was important for us.

This project was never about building another video player to show to your Google Meet buddies. It was about building a translation and interaction layer. The world is full of amazing content locked behind language barriers. Many creators provide high value in this economy but remain shadowed by technical difficulties or a lack of time to engage globally. This tool was designed to bridge that gap and open up those walled gardens.

The Ear.

Without this: You wait 10 seconds. You see the chat react. You reply. By then the vibe has moved on and you look like you are reacting to something from the past.

Built a custom React hook called useYouTubeChat. It bypasses heavy polling (asking for data repeatedly) by creating a stream of raw data from the Youtube API. It's an always-on listener that catches messages the millisecond they are posted.

The translatorrrr…

This is our middleware. Before a message hits the UI, its intercepted by a translator utility. It processes the text in-flight ensuring that by the time the data packet reaches the viewer it has already been localized.

The fast lane.

We use Socket.io (WebSockets). it keep a live wire open between the server and the browser. This cuts the round trip time ensuring that the bullet chat stays perfectly synced with the live action.

The lane was smooth felt like magic.

how the page hooks into the Socket.io instance and the YouTube listener simultaneously. It manages a delicate lifecycle: capturing a message passing it through the translation engine and then firing it into the WebSocket relay so that every connected client sees the same "bullet" at the exact same millisecond.

What makes this module work is its event-driven nature. Rather than relying on heavy re-renders or global state stores that would lag under the pressure of a fast-moving chat it uses a localized push mechanism. When useYouTubeChat detects a new entry the module triggers a chain reaction:

Validation: The module strips away heavy, useless metadata. It keeps only what matters (User, Text and Timestamp). This keeps the packet size tiny so it travels faster across the internet.

Transformation: It invokes the translator utility. This happens asynchronously, meaning the rest of the app doesnt have to freeze while waiting for the translation to finish.

Emission: Once the message is ready, it is emitted via Socket.io. This is a broadcast. It tells every single person watching the stream: "Hey 123, render this specific message at this exact millisecond."

By centralizing this logic in a single high level controller, the application maintains a single source of truth. This prevents the "Clutter Issue" mentioned earlier as the page can choke or prioritize messages before they ever hit the rendering engine ensuring the stream remains readable even during peak activity.

The Reality Check.

The initial vision for the dashboard was a clean reactive masterpiece but the first time the system encountered a high traffic stream the perfect design broke. the clutter issue wasnt just visual it was computational. the primary point of failure was the state synchronization. In a standard react app updating state is trivial but in a live dashboard receiving 20 messages per second every state update triggers a rerender. initially the entire dashboard was rerendering every time a single bullet appeared. this caused the video feed to stutter and the ui to become unresponsive the exact latency trap Iwas trying to avoid. Another reality check came from the hydration race condition. I attempted to use a sophisticated notification library to show top donors or new subscribers but because the websockets were firing before the next.js client-side hydration was complete the app would crash or throw mismatch errors. To survive the finish line i had to pivot to a functional over fancy mindset. I stripped away the heavy state management and moved to a ref-based queue system for the danmu. I also replaced the complex notification components with a simplified showtoast utility that used a basic alert or a raw dom injection. It wasnt the polished ui I envisioned on day one but it was the only way to ensure the platform didnt collapse under the weight of its own data.

Trade offs:

In the race to build a functional prototype certain long-term optimizations were sacrificed. The most significant trade off is the client side translation. currently every users browser handles its own translation calls. While this works for a demo a production scale app would move this to a server side worker or an edge function to protect api keys and reduce the cpu load on the viewers device. Another accepted debt is the memory management of the message queue. Right now the app keeps a running list of chat history in memory. For a short stream this is fine but for a 24/7 broadcast it would eventually lead to a memory leak. The immediate fix was to focus on the now but the long-term solution requires a robust garbage collection logic to prune old messages.

The Debugging War Story: The Hydration Race Condition

The most frustrating bug involved the websocket connecting before next.js had fully hydrated the page. Because the socket was ready to receive data faster than react was ready to render it the app would attempt to inject danmu into a dom that didnt technically exist yet. The fix was a specific unglamorous check inside a useeffect to ensure the component was fully mounted before the socket listeners were allowed to fire. Its a simple guard clause but it was the difference between a white screen of death and a working dashboard.

Validation: How to Verify the Flow Locally

to verify this project on your own machine clone the repository and run npm install. for environment setup add your youtube api key and lingo.dev api key to a .env.local file. launch the dev server with npm run dev and navigate to the /live route for the live test. input a video id using any active youtube live id. if the architecture is working you will see the console log the connection and translated messages should begin flying across the overlay within seconds.

The Missing Piece: What to Build Next

if someone forks this repository today the most obvious missing piece is dynamic sentiment styling. the next step for this project is to analyze the mood of the incoming text. if the community is hyped the danmu bullets should change color to a vibrant red or gold and increase in velocity. if the chat is calm they should turn blue and slow down. this would turn the overlay from a simple message board into a living pulse of the creators audience.

again and again LinguaCam Live is more than just a dashboard, it is a high performance interactive bridge between a streamer and a global audience. It is designed to be pulled into OBS as a browser source to turn a basic video feed into a professional AI-powered broadcast.

The Communication Core (Breaking Language Barriers)

AI-Translated Captions: It uses Lingo.dev to listen to the streamer's voice and turn it into English captions instantly.

Multilingual Support: It doesn't just transcribe; it translates. This allows a creator speaking any language to be understood by a global English-speaking audience in real-time.

YouTube Sync: It hooks into your YouTube Live chat, pulling comments out of the side window and into the actual video feed.

The Visual Experience (Wave Danmu & FX)

Wave Danmu: Instead of boring, static text, chat messages move in a fluid, "sinus-based" wave across the screen.

Collision-Free Logic: It uses an 8-lane vertical positioning system. This ensures that even if 100 people chat at once, the messages never overlap or hide the streamer's face.

Cinematic FX: Streamers can apply 20+ filters (like film grain, retro, or vibrant boosts) directly to their camera feed within the browser.

The Interaction Suite (Audience Engagement)

Sticker Reaction Pop: Viewers can trigger emoji explosions (😊❤️🔥) that pop up on the stream overlay.

Voice Sounds: The streamer can trigger specific sound effects (SFX) using custom voice commands, making the broadcast feel like a high-budget TV show.

Quick Chat: A one-tap button system for the audience to engage instantly without typing long sentences.

The Technical Edge (Speed & Control)

Live Pipeline: Everything happens with sub-100ms latency. This "ultra-low latency" is the "Ear" we discussed — it ensures the translation and the chat happen at the exact same time as the video.

Smart Focus: The software uses automatic pan-zoom framing to keep the streamer centered and in focus, mimicking a professional cameraman.

Setup APIs: A simple, secure control center to link your YouTube and Lingo.dev keys without needing to be a coding expert.

Thank you