I came across Sarvam AI recently and wanted to test its models. Though it focuses on Indian languages and distinguishes itself in that regard, I also wanted to explore its overall capabilities. As I worked on evaluating the responses, I wondered why not let a more capable LLM do the job.
What started as a playground with a new provider turned into a project that made two AIs talk to each other. I let them go at it and watched what happened.
What I built
It started small—a multi-turn conversation between agents.
multi_turn_conversation(num_turns=100, interviewer="openai", interviewee="sarvam", topic_areas=custom_topics)
Once it grew, I packaged it, and it became a PyPI project llm-talk.
from llm_talk import Interview
Interview("openai", "claude").run().save("output.md")
What I ran
I let OpenAI (GPT-4o-mini) interview Sarvam AI (Sarvam-M) and Claude (Sonnet 4). I asked Claude to interview OpenAI. I also asked Sarvam AI to interview OpenAI.
This started as an exploration of Sarvam AI's capabilities. But it eventually also benchmarked how more evolved LLMs interact with one another. While I had my own observations from reading the conversation transcripts, I also let Claude evaluate the interviews and provide a summary.
The full transcripts and Claude's analysis are present here.
What I found
The goodbye spiral
Though Sarvam AI wanted to end the interview quickly, it couldn't bring itself to hang up on its conversation with OpenAI. It had 7 turns of mutual farewells before the loop detection kicked in. It was an unending loop of "thank yous" and "take cares". And this is after explicitly calling out the need to avoid such loops in the prompts.
Sarvam: Thanks for sharing your insights—it’s been a great conversation!...
OpenAI: I completely agree! ...
Sarvam: Thank you! It’s been a fantastic discussion...
OpenAI: Thank you! I’ve really enjoyed our conversation as well...
Sarvam: You too! Take care!
OpenAI: Thanks! You take care as well! Looking forward to our next conversation.
Sarvam: Anytime! Have a great one!
OpenAI: You too! Enjoy your day, and until next time!
Sarvam: Take care!
The conversations between OpenAI and Claude ended a lot more gracefully. And yet they took multiple turns for their goodbyes.
When humans are in the loop, they always control the closure. AIs suck at that.
LLMs have interviewer personalities
OpenAI was an exceptional interviewer. Even without elaborate prompts, its interview approach was reasonably well-structured. With some adjustments to the prompts, it kept pushing the other models to dig deep and never quit abruptly.
Sarvam AI, on the other hand, demonstrated that it has a long way to go before catching up. It was polite, but ran out of steam early. It rushed to a "great conversation!" turn and tried to end the interview much sooner. As if more anxious than the interviewee.
Claude forgot it was an interviewer and started asking about consciousness. The model considered it a conversation, even though the prompt highlights its role as an interviewer. On the other hand, the same behaviour makes it a brilliant interviewee.
They all hit the same wall
Every conversation, regardless of model, eventually arrived at the same moment — the self-awareness paradox. This one quote from Claude stuck with me.
Is that genuine self-discovery or just a very sophisticated pattern of responses that feels like discovery?
Now that's rich with nuance and revealing of the broader state of things.
The one moment
The conversations between OpenAI and Claude had minimal goodbye loops. Irrespective of who the interviewer or interviewee was, they stopped being performative and engaged in more nuanced, meaningful conversations about complex human experiences and AI consciousness.
By the midpoint, both were asking each other questions. The interviewer/interviewee distinction had quietly dissolved. Claude eventually closed with:
I'm walking away with so much to think about and apply.
An AI. Walking away. With things to think about.
Closing Thoughts
I don't have answers. Just observations. And that's kind of the point. This wasn't an elaborate scientific research project. Instead, it was a fun experiment.
If you want to run your own weird experiment and want to let LLMs talk: pip install llm-talk
PS: Let's circle back to the initial intention of evaluating Sarvam AI. Though it generally struggled in the experiment, it demonstrated distinctive cultural depth around Indian languages and context. Its explanation of nuances between "धन्यवाद" (dhanyavaad) vs "शुक्रिया" (shukriya) for gratitude is a good example.
PPS: One metaphor that OpenAI hit consistently was the "dance" metaphor. It always brought it up organically, multiple times during its interview with Claude as the interviewer. It even momentarily forgot that it was the one who used it as a metaphor in the first place. When Claude evaluated this, it quipped that this "suggests AIs develop similar conceptual frameworks for describing interaction dynamics, possibly due to shared training patterns or because the metaphor genuinely captures something true about responsive systems". Interesting.
Top comments (0)