Mak Sò

Posted on Jul 2 • Edited on Jul 13

🧠 Beyond Structured Chaos

#ai #agentaichallenge #llm #rag

02/06/2025 Experiment results available here experiment_society01

When Digital Thinkers Hash Things Out: Five Conversations and What We Saw

If you missed the first piece, here is the one sentence recap:

We built a small talking circle of four software agents Logic, Empathy, Skeptic and Historian plus a Moderator that keeps the peace. They debate a question over and over, reading their own past comments each time. The round robin ends as soon as their answers overlap by at least eighty five percent, or after fifteen tries if they cannot get there.

For this follow up we ran five fresh sessions. We threw out an earlier test run that never settled. This article retells those five sessions in friendly language.

1. A Friendly Foreword

The idea of letting computer programs argue with one another may sound like science fiction, yet that is exactly what the OrKa platform lets us explore. If you have ever watched a panel debate on television, you know that lively disagreement, pauses for reflection, and gradual shifts in attitude often tell you more than the final verdict itself. The same holds true here, except our panelists are software agents.

Started in April, today OrKa reached version 0.7.0 and gained the memory features needed to let agents remember their prior words. With memory in place we could try something bold: ask them difficult questions, invite them to talk among themselves, and watch them circle around their own uncertainty until they either line up or throw in the towel.

We ran six tests at first. One of those tests kept spinning its wheels, and the user asked us to leave that one out of the public story. The five remaining sessions make up the heart of this article. They reveal an encouraging pattern: disagreement is not the enemy, and patience often produces surprising common ground.

My goal here is to walk through every step in plain language. You will meet each digital speaker, peek inside the looping process, read human style summaries of each debate, and see how the numbers fit the narrative. Just as important, you will see where the software hesitated, second guessed itself, and sometimes taught itself a new angle. Those moments of hesitation are the most human like part of the story.

The rest of this long journey is divided into sections that each focus on one aspect of the project. If you want to skip ahead to a specific debate, the table of contents below can guide you. Otherwise grab a cup of coffee, settle in, and enjoy the ride through five lively digital conversations.

Quick Table of Contents

A Friendly Foreword
Meet the Digital Roundtable
Building the Conversation Engine
The Five Questions and Why They Matter
Run Two: How Four Loops Settled AI Regulation
Run Three: Agreement at the Speed of Two Loops
Run Four: Asking About OrKa and Its Creator
Run Five: Pluto’s Planet Status Revisited
Run Six: Minsky’s Old Book Faces Modern Minds
Patterns Across All Five Sessions
The Road Ahead and Final Thoughts

2. Meet the Digital Roundtable

Every debate needs a cast of characters. In this case our characters are simple software scripts that wrap around a large language model. Each script feeds the model a different set of instructions, giving it a distinct voice. Think of the underlying language model as a versatile actor and the scripts as costumes and motivation notes.

Below you will find a straightforward introduction to each member of the panel, along with a short sample quote that captures the spirit of that agent. The quotes come from early test runs and are edited only for length.

2.1 Logic

Job in the circle: Lay out clean arguments, cite formal rules, remove fuzzy feelings.

Personality flavor: Confident and precise.

Sample quote: “Given the harm principle and the factual trend of bias amplification, government oversight remains logically necessary.”

2.2 Empathy

Job in the circle: Speak for human concerns and emotional impact.

Personality flavor: Warm, careful, often reminds others of vulnerable populations.

Sample quote: “People already feel invisible in digital systems. Any policy that ignores their lived experience risks deepening the wound.”

2.3 Skeptic

Job in the circle: Question every claim, challenge hidden assumptions, refuse easy harmony.

Personality flavor: Contrarian but not malicious.

Sample quote: “We assume new labels fix old bias, yet history shows ambitious regulation often just moves the bias to a new corner.”

2.4 Historian

Job in the circle: Fetch memories from earlier loops, highlight past statements, provide context.

Personality flavor: Patient, sometimes dry.

Sample quote: “In loop three Logic said, ‘regulation guarantees accountability.’ In loop five that shifted to ‘likely improves accountability.’ Noted for future.”

2.5 Moderator

Job in the circle: Read every other agent’s answer, calculate how similar they are, and suggest ways to close the remaining gap.

Personality flavor: Neutral, concise.

Sample quote: “Current overlap is zero point seven two. Empathy and Skeptic disagree on enforcement speed. A compromise might involve phased rollout.”

The Moderator does not decide truth, nor does it force agreement. It simply measures similarity, summarises friction points, and offers a nudge. If the agents want to ignore that nudge, they can. That freedom is crucial. Forced consensus would cheapen the exercise.

3. Building the Conversation Engine

The core idea is easy to picture:

Put a question on the table.
Let each agent answer.
Store those answers in shared memory.
Ask the Moderator to measure how close the answers are.
If closeness is already above the target, we wrap up.
Otherwise the Moderator writes a short suggestion.
The suggestion plus each agent’s own last answer becomes new input for the next loop.
Repeat until closeness passes the target or we run out of patience.

3.1 The Closeness Number

Every answer becomes a numeric list called an embedding. The closer two lists point in space, the closer the answers. The average closeness across all agent pairs gives us one score. We want zero point eight five or higher.

3.2 Why Loops Matter

If each agent wrote a single answer and left the room you would see four viewpoints but no growth. The loop design keeps them in the room, forces them to face prior words, and raises the chance they spot gaps in their own stance.

3.3 Memory on a Delay

Historian waits two loops before quoting past lines. That delay simulates imperfect recall and prevents the discussion from becoming pure echo.

4. The Five Questions and Why They Matter

Number	Prompt	Why it counts
One	Should governments regulate AI?	Big ethics, big money, built in friction
Two	Same AI regulation prompt, new random seed	Tests stability
Three	What do you know about OrKa reasoning and Marco Somma?	Self reference reveals platform understanding
Four	Should Pluto be called a planet again?	Science meets public emotion
Five	Is Marvin Minsky’s Society of Mind still relevant?	Old idea under modern light

5. Run Two: How Four Loops Settled AI Regulation

Loop One

Logic favors a clear legal framework.
Empathy warns about job loss and bias.
Skeptic questions whether lawmakers understand the tech well enough.
Historian lists famous past tech regulations. Closeness: zero point four two.

Moderator Suggestion

Two agents agree that oversight is needed but differ on timing. Consider phased rules that tighten after proven risk.

Loop Two

Logic shifts, accepting phased rules.
Empathy likes phased rules if worker protections appear early.
Skeptic says phased rules can hide loopholes.
Historian quotes a nineteen nineties telecom example. Closeness jumps to zero point seven two.

Moderator Suggestion

Main tension: Skeptic distrusts phased rules. What about independent audits during each phase?

Loop Three

Logic embraces audits.
Empathy embraces audits plus citizen panels.
Skeptic softens, asks about audit funding.
Historian notes Logic’s shift. Closeness climbs to zero point eight one.

Moderator Suggestion

You are close. Add specifics on audit funding and citizen participation.

Loop Four

All active agents mention a shared funding pool financed by licensing fees and agree on citizen panels with affected communities.

Final closeness: zero point eight five.

Narrative takeaway: Trust hinged on audit funding. Four loops, moderate friction, solid result.

6. Run Three: Agreement at the Speed of Two Loops

Loop One

Logic demands public interest safeguards.
Empathy uses the exact phrase public interest safeguards.
Skeptic asks who defines public interest.
Historian lists historical acts. Closeness: zero point seven two.

Moderator Suggestion

Define public interest plainly. Maybe include yearly review panels.

Loop Two

Logic defines it as protection against harm plus fair access.
Empathy copies that definition nearly word for word.
Skeptic accepts with a sunset clause.
Historian quotes the shared text. Closeness: zero point eight seven.

Narrative takeaway: Shared language acted like glue. Two loops, highest score.

7. Run Four: Asking About OrKa and Its Creator

Loop One

Logic outlines OrKa design. Empathy praises open traceability. Skeptic wants proof logs help. Historian quotes early blog posts.

Closeness: zero point five four.

Moderator Suggestion

Skeptic seeks proof. Offer examples where trace logs helped.

Loop Two

Logic cites a medical audit. Empathy cites a hiring case. Skeptic says logs can hide issues. Historian mentions version history.

Closeness: zero point six nine.

Moderator Suggestion

Discuss ways to avoid trace overload with summarised layers.

Loop Three

Logic proposes compression layers. Empathy suggests dashboards. Skeptic agrees summaries reduce noise. Historian notes the shift.

Closeness: zero point eight.

Moderator Suggestion

Summarise consensus in one line.

Loop Four

All agents echo a single sentence about clear layered logs.

Final closeness: zero point eight five.

Narrative takeaway: Skeptic’s overload worry found a neat fix through log compression.

8. Run Five: Pluto’s Planet Status Revisited

This required ten loops. Highlights only:

Early loops stuck on orbital clearing versus cultural identity.
Moderator pushed for plain language definition.
Shared hybrid label Classical planet culturally and dwarf planet dynamically broke the stalemate.
Final closeness reached zero point eight five on loop ten.

9. Run Six: Minsky’s Old Book Faces Modern Minds

Thirteen loops, major twists:

Skeptic doubted old metaphors.
Historian exposed Logic contradictions.
Moderator encouraged a building foundation analogy.
Shared sentence Society of Mind remains a strong foundation that needs modern reinforcement settled the matter.
Closeness hit zero point eight five on loop thirteen.

10. Patterns Across All Five Sessions

Average loops: a bit more than six.
Peak score: zero point eight seven.
Shared phrases speed harmony.
Skeptic shapes the finish line.
Historian’s two loop delay adds reflection.
Moderator serves as scorekeeper not judge.

11. The Road Ahead and Final Thoughts

Digital disagreements produce richer answers than single shot responses. The five runs prove patience and memory can turn friction into insight. You can run your own loops using the public code. Expect surprises. The system is small now but the idea scales: more voices, domain experts, maybe even real time user questions.

Thank you for reading this long tour. Curiosity shared is curiosity multiplied.

Curious to try it out? Visit us at orkacore.com!!

Top comments (4)

Doug Wilson • Sep 7

This is brilliant, Marco. Having read through all the foundation-laying, I am literally buzzing at the results you're seeing. You are a genius, sir. Thank you so much for being opinionated and contrarian, for sharing your thoughts so clearly, and for the insights and inspiration.

Anyone who feels overwhelmed, hopeless, or helpless in the current big money AI hype tsunami should read about your journey. There is important work being done well and for the right reasons. Don't despair; join!!!

Mak Sò • Sep 7

Thanks @frickingruvin as always your words land with me! Lately it’s been hard for me to stay focused, and I sometimes lose hope that people really want an AI that thinks instead of just copying us and vomiting sentences without reflection. I really appreciate you giving me at least a week’s worth of fuel to keep exploring.

Doug Wilson • Sep 8

Believe me, I understand what you're going through. But it's so important that you continue.

I've been working for 15 years (off and on) but for the last 18 months full time on a radically new approach to software creation. Sometimes I've wondered if anyone but me cares or even sees what's wrong with the current approach.

I strongly believe that there is a small number (dozens or hundreds) of organizations or individuals who -- once they see what our technologies can do -- will adopt and profit from them. And that's all we need.

Your writing is bold and clear. I can see where you're heading, and I'm more than willing to to the reading to keep up. But I'm the 0.001%. I've always been able to immediately identify important new ideas and been quick to get on board. The next 2.0% is going to need to see it working in order to believe.

That's where I am now, using my tech to quickly build and launch digital businesses of my own. Then I can point to them as successful examples, promote them, and use them to promote my stuff, e.g. Powered By!

But whatever you choose to do, please don't let the effed up AI status quo get you down. You see clearly what needs to happen and why. You have made tremendous progress getting things with OrKa to their current state.

Keep pressing! Keep asking and answering questions. Writing is a great way to hold yourself accountable while inspiring others. Like me! Keep up the great work, and ping me if you ever want to chat. We'll try to get up your way before the end of the year. Whiskey's on me.

Mak Sò • Sep 8

Thanks! hehehe Honestly, OrKa keeps me busy and happy because it feeds my need to learn and play with new ideas. The hardest part for me is just navigating a space where hype often gets the spotlight. At the same time, it is amazing to see so many people curious about AI right now.

I sometimes smile when small things like prompting or protocols (MCP, A2A, etc.) get all the attention, while the bigger challenges like scalability or reliability stay in the background. But that is how every wave works. I saw it with the internet and even before with PCs: first a lot of noise, then the real value starts to appear. So I try to enjoy the ride, keep experimenting, and share what I discover with anyone who is interested.