DEV Community

Amvi Dwivedi

Posted on Mar 16

How We Built a Real-Time AI Negotiation Coach

#geminiliveagentchallenge

So here's a situation most people have been in. You find an apartment you like, the price is a little high, and you go into the call telling yourself you're going to negotiate. Then the landlord says something confident like "the price is firm" and you just... fold. You say okay. You hang up. You've just committed to paying $200 more a month than you needed to.
That happens to almost everyone, and it's not because people are bad negotiators. It's because negotiating in real time is hard. You're nervous, you don't have the right words ready, and you don't know what the market actually looks like. That's the problem we decided to solve.

What we builtNegotiateIQ is a live negotiation coach that listens to your rental conversation and whispers tips to you in real time through your screen. Think of it like having a savvy friend sitting next to you mouthing "say this" while you're on the call.Before the call you enter some context. The property address, the asking price, your budget, what matters most to you. Then you start the conversation, and as you talk, the agent is listening. When it detects a negotiation moment like a price objection, a concession opportunity, or a pressure tactic, it fires a flashcard on your screen. Six card types in total: tactic alerts, counter-moves, data points, suggestions, reinforcements, and silence cues. That last one just says "PAUSE HERE, let the silence work." Sometimes that's the best move.You keep talking. Your landlord has no idea any of this is happening.

The tech behind it
We built this for the Gemini Live Agent Challenge, which pushed us to go beyond the usual text-in text-out AI pattern and build something that actually feels live.
The most interesting architectural decision was running two parallel analysis paths at the same time.
Path 1 uses gemini-2.5-flash-native-audio-latest via the Gemini Live API, which processes raw PCM audio in real time. This catches tone, hesitation, and speech patterns that text transcription loses entirely. It can detect when a landlord is using a pressure tactic from their voice alone, not just their words. The catch is that the Live API drops its session after each model response and has to reconnect, so there are brief gaps in coverage.
Path 2 fills those gaps. The browser's built-in SpeechRecognition API transcribes the conversation to text continuously, which gets sent to gemini-2.5-flash for analysis. Text transcription is more precise for catching specific numbers and offers. Both paths feed cards into the same AsyncIO queue, which drains at a rate-limited pace. One card every 5 seconds minimum.
The backend is FastAPI running on Google Cloud Run, connected to the frontend over a single WebSocket. Binary frames carry the raw audio, text frames carry JSON. One connection handles everything.
Getting the audio format right was its own challenge. Gemini expects 16-bit signed PCM at 16kHz mono. The browser captures Float32. We wrote an AudioWorklet processor that runs in a dedicated audio thread, converting Float32 samples to Int16 and posting them to the main thread for WebSocket transmission. One critical detail: echo cancellation had to be turned off. Otherwise the browser suppresses the landlord's voice coming through the phone speaker, which defeats the whole point.
The frontend is Next.js with a neo-brutalism design system. Thick black borders, hard offset shadows, three accent colors (coral, yellow, teal). Clean and fast, which is exactly what you want when someone is mid-conversation trying to read a flashcard without losing their train of thought.

The part that actually surprised us
The hardest thing wasn't the AI integration. It was figuring out what to show and when.
Early on we were firing too many flashcards. Every sentence triggered something. It was overwhelming and made the experience worse, not better. The 5-second rate limit came from actual testing. 3 seconds felt rushed, 10 seconds felt sluggish, and 5 was the sweet spot for reading a card, absorbing it, and being ready for the next one.
The other big UX decision was the fixed 3-slot card grid. Instead of a scrolling list, we use exactly three slots with priority-based replacement. When all three are full and a new card comes in, the lowest-priority card gets bumped first. Reinforcements go before silence cues, silence cues before suggestions, and so on up to tactic alerts which are the last to be displaced. It keeps the display stable and makes sure the most urgent coaching is always visible.
The pre-call context setup also changed the quality of coaching more than we expected. When the agent knows your budget and priorities upfront, the flashcards get specific. Instead of "try to negotiate the price" it says "you have $300 of room, offer $1,850 and justify it with the 12-month commitment." That specificity is what makes it actually useful rather than generic.

What we'd build next
A few things we didn't get to but want to:
Multi-language support so the agent detects the negotiation language and coaches in that language. And team mode, for when multiple people are in the same negotiation and each person needs different coaching.

Try it:
Live: negotiate-iq-delta.vercel.app

Arnav8041 / NegotiateIQ

Real-time AI negotiation coach — get tactical coaching cards on screen during live conversations. Powered by Gemini Live API on Google Cloud.

NegotiateIQ

Your real-time AI negotiation coach.

NegotiateIQ listens to your live negotiations and shows you tactical coaching cards on screen -- like having a negotiation expert passing you notes during a high-stakes conversation.

Built for the Gemini Live Agent Challenge hackathon -- Powered by Gemini Live API on Google Cloud

What It Does

You open the app and describe your situation (e.g. "My landlord wants to raise rent from $1,400 to $1,650")
You start your negotiation -- phone call, video call, in-person -- with the app open nearby
The app listens via your device mic and shows coaching cards in real-time
- Counter-moves -- what to say next
- Tactic alerts -- when the other party uses anchoring, false deadlines, etc.
- Market data -- rent comps, salary benchmarks pulled live
- Silence cues -- when to stop talking and let silence work for you
- Reinforcement -- when you are doing great
After the…

View on GitHub

by Arnav Nayak and Amvi Dwivedi