DEV Community

Cover image for 5 must know open-source repositories to build cool AI apps
Ankur Tyagi
Ankur Tyagi

Posted on

5 must know open-source repositories to build cool AI apps

Everywhere I look, teams are racing to ship AI-powered features, from solo founders building chatbots to enterprise teams automating workflows. The momentum is massive, and the big players (OpenAI, Google, and Meta) are pouring billions into new models.

But here’s the truth: you don’t need their budgets to build something impressive. What you do need are the right open-source tools and frameworks that give you full control, transparency, and freedom to experiment.

After experimenting with tons of AI integrations, I’ve found a handful of open-source repositories that make building real-time, multimodal apps actually doable.

These tools let you move from idea to prototype fast, no black boxes, no vendor lock-in.


1. Stream Vision Agents: Build Real-Time Video + Audio Intelligence

One of the cooler projects I’ve seen lately is Stream Vision Agents, an open source framework for building real-time, multimodal AI that can see, hear, and respond in milliseconds.

It’s built for developers who want to bring true intelligence to live video without being locked into a single model or transport provider.

  • Open Source: Fork it, read it, improve it.
  • Open Platform: Works with Stream Video or any WebRTC-based SDK.
  • Flexible Providers: Plug in OpenAI Realtime, Gemini Live, or your favorite STT/TTS and vision models.
  • It’s a bit like LiveKit Agents, but with a bigger focus on real-time vision and multimodal intelligence.

Let’s take a look at this example:

Sports Coach:

You can spin up a golf coaching AI with YOLO and OpenAI Realtime as the brain. YOLO handles pose detection, while the Realtime API reacts to movements as they happen. No lag, no buffering.

The cool part is, it’s not just for golf. The same setup works for stuff like drone-based fire detection, sports or gaming analytics, physical therapy assistance, workout form correction, and interactive dance or movement-based games. Basically, anything that needs a live “eyes and ears” AI.

agent = Agent(
    edge=getstream.Edge(),
    agent_user=agent_user,
    instructions="Read @golf_coach.md",
    llm=openai.Realtime(fps=10),
    #llm=gemini.Realtime(fps=1), # Careful with FPS can get expensive
    processors=[ultralytics.YOLOPoseProcessor(model_path="yolo11n-pose.pt")],
)
Enter fullscreen mode Exit fullscreen mode

For more about Vision Agents, visit their documentation.

Star the Vision Agents repository ⭐


2. Open-Sora: High-Fidelity Text-to-Video Generation

Open-Sora is a super interesting open-source take on OpenAI’s Sora. It lets you convert text or images into short, high-quality videos that actually look stable (smooth motion, consistent frames, the whole thing). You can fine-tune it on your own datasets if you want to generate domain-specific stuff like marketing clips, story scenes, or quick simulations. It’s still early, but there’s a lot of room to experiment.

Why you’ll like it:

  • Supports text-to-video and image-to-video generation
  • Built for efficiency with diffusion-based architecture
  • Ideal for short clips (up to 15 seconds)
  • Actively maintained and open for contributions.

Star the Open Sora repository ⭐


3. OpenVoice v2: Instant Voice Cloning and Speech Synthesis

OpenVoice v2, built by the BentoML team, is one of the most impressive open-source voice cloning projects out there right now.

It can replicate a speaker’s tone and accent from just a few seconds of reference audio. It’s great for anything voice-driven. Think interactive AI agents, dubbing, or voice-enabled interfaces.

Why you’ll like it:

  • Multilingual and emotion-aware voice synthesis
  • Works well with real-time frameworks such as Stream Vision Agents
  • Simple API for inference and fine-tuning

Star the Open Voice repository ⭐


4. SpeechBrain: All-in-One Toolkit for Speech and Audio Intelligence

SpeechBrain is a PyTorch-based open-source toolkit that covers pretty much everything audio: ASR, TTS, speaker recognition, even speech enhancement.

It’s modular, easy to experiment with, and surprisingly production-ready. There are tons of prebuilt recipes if you just want to prototype fast or plug audio intelligence into something bigger you’re already building. .

Why you’ll like it:

  • Unified library for speech recognition and generation
  • Integrates easily with LLMs and real-time frameworks
  • Supports distributed and on-device inference

Star the Speech Brain repository ⭐


5. LiveKit Agents – Build Real-Time Voice and Video AI Applications

LiveKit Agents makes it easy to build real-time voice and video AI apps that actually feel live. Low latency, no awkward lag. You can run it locally or in the cloud, and hook it up to models like OpenAI Realtime, Gemini, or Whisper to handle the heavy lifting. It’s great for stuff like virtual meeting assistants, customer-support bots, or live translation apps.

Why you’ll like it:

  • Real-time streaming via WebRTC
  • Scales to thousands of concurrent sessions
  • Works seamlessly with custom or hosted LLMs

Star the agents repository ⭐


Thanks for reading the article.

In the comments below, let me know if other cool AI tools or frameworks have helped you build your application.

P.S. Feel free to follow me on X; I share valuable stuff - promise!

Top comments (11)

Collapse
 
arshadayvid profile image
David Asaolu

This is a great list! 🔥
Looking forward to trying out the Stream Vision agents

Collapse
 
tyaga001 profile image
Ankur Tyagi

thanks David

Collapse
 
timothyolanrewaju profile image
Timothy Olanrewaju

Solid list. Stream Vision Agents look interesting, would check it out.

Collapse
 
tyaga001 profile image
Ankur Tyagi

yes you should give it a try thx

Collapse
 
isaactony profile image
Isaac Tonyloi - SWE

Thank you for sharing !

Collapse
 
tyaga001 profile image
Ankur Tyagi

welcome

Collapse
 
emmanuelthecoder profile image
Emmanuel Aiyenigba

Nice article. Thanks for sharing.

Collapse
 
tyaga001 profile image
Ankur Tyagi

welcome

Collapse
 
olivia-john profile image
Olivia John

Awesome article! Thanks for sharing these resources.

Collapse
 
tyaga001 profile image
Ankur Tyagi

welcome

Collapse
 
getstream profile image
Stream

cool