Create account

DEV Community

Rafael Milewski

Posted on Dec 7, 2025 • Edited on Dec 9, 2025

EchoSense: Capture Every Word

#devchallenge #muxchallenge #showandtell #video

DEV's Worldwide Show and Tell Challenge Submission 🎥

This is a submission for the DEV's Worldwide Show and Tell Challenge Presented by Mux

What I Built

I'm reusing a previous project of mine:

EchoSense: Your Pocket-Sized Companion for Smarter Meetings

Rafael Milewski ・ Nov 23 '24

#devchallenge #assemblyaichallenge #ai #api

I built a portable device that captures its surroundings and enhances it with real-time insights and knowledge capabilities. Users can place the device in a meeting, for example, to get live transcription, ask questions to the AI it connects to, and receive automatic summaries. The system can package all of the content and send it by email for later review, analysis, or record keeping.

My Pitch Video

Demo

Since this project relies on dedicated hardware to function as intended, it's not possible to provide a full end-to-end demo without physically shipping a device to the judges.

However, I've created a simulation environment that allows you to preview the frontend experience and explore the core interactions:

echosense-simulation.vercel.app

Source Code:

milewski / echosense-challenge

Making sense of echoes and delivering insights

EchoSense

Portable device for real-time audio transcription and interactive summaries.

This is the main repository for my submission to AssemblyAI Challenge.

Esp32: The firmware source code for the ESP32-S3-Zero device.
Frontend: The UI that communicates with the device via websocket.

Each subfolder includes instructions for running the project locally.

For a more detailed overview, including screenshots, you can read the submission sent to the challenge here:

https://dev.to/milewski/echosense-your-pocket-sized-companion-for-smarter-meetings-3i71

View on GitHub

The Story Behind It

This project was originally created for a dev.to hackathon, but the idea was inspired by a real-world observation. I often attend meetings where most participants are non-native English speakers. I noticed some coworkers relying on some automatic captioning software instaled on their computer just to keep up, either because their English inst good enough or because the mix of accents made things difficult to understand.

This was the root inspiration for EchoSense. I wanted to build something that not only provided real-time captioning, but went beyond what tools like Zoom or Teams offered at the time by adding features like live transcription, AI-powered insights, summaries, and more. I believe a device like this can be useful for a wide variety of users, and since it's simple to assemble and build, it could even be a fun weekend project.

I also drew inspiration from a few open-source hardware projects, such as:

Both share all the files and tutorials needed to build a "smart" device yourself. I love this approach, it's a great way to learn, experiment, and deepen your understanding of how hardware and software work together.

Technical Highlights

The stack isn't complicated. The device is built on an ESP32, but instead of using traditional C-based code, it's written in Rust. I chose Rust because I believe it's better suited for embedded development: it provides a modern developer experience, strong safety guarantees, and excellent performance. In practice, this made development significantly easier and less error-prone than if I had written it in C.

Due to hardware limitations, the device itself isn't capable of running transcription or large language model tasks locally. Instead, it captures audio and streams data to a third-party service for real-time transcription and AI processing, keeping the hardware lightweight while still enabling powerful functionality.

One interesting challenge I faced during this project was that the ESP32 variant I chose has an extremely small stack memory. It would sometimes crash simply by receiving an extra item in a JSON array from the server... in other words, a JSON response of just a few kilobytes would instantly crash the device when attempting to parse it because there wasn't enough memory to hold the data. This is almost unthinkable for the new generation of developers who are used to working with effectively unlimited resources and architect as if client devices have infinite RAM. In most cases today that mindset doesn't cause issues, because memory is abundant, but on a constrained device like this, every byte matters and small decisions can have huge consequences.

Because of this limitation, the software had to be written with extreme efficiency to avoid stack overflows while still handling tasks in real time and serving multiple clients.

This experience gave me a glimpse of what it must have been like to build computer software or video games when machines were orders of magnitude slower than this microcontroller. It's fascinating how far computing has come, and how disconnected most modern developers are from the constraints that used to define everyday programming.