DEV Community: Manikandan Pandurangan

Ever Wonder Which Movie Character You're Most Like? Try This ChatGPT Prompt First.

Manikandan Pandurangan — Mon, 29 Jun 2026 07:35:38 +0000

Before you read further - try this prompt yourself.
Replace {INDUSTRY} with any of these:

Tamil Movies · Telugu Movies · Malayalam Movies · Bollywood · Hollywood · Anime · Marvel · DC · TV Series

The Prompt

Based on everything you know about me from our previous conversations,
analyze my personality, thinking style, communication style, values,
career choices, strengths, weaknesses, spiritual interests,
and decision-making patterns.

Match me with the Top 10 fictional characters from {INDUSTRY}.

Rules:
- Match based on psychology, motivations, values, decision-making, and worldview.
- Do not match based on profession, appearance, or popularity.
- Rank from highest similarity to lowest.
- For each match provide:
  - Similarity score
  - Why we match
  - Strengths
  - Blind spots
  - Memorable scenes that reflect me
  - Which character I may become in 10 years

Finally compare all characters in a summary table. Give me the picture of top matching character first then explanation from searching internet, don't create new image

Run it. Get your result. Then come back.

My Result

I got Vikram from Vikram Vedha as my highest match.

Not because I work in law enforcement. Not because I look like Madhavan.

Because of how I think - systems first, patterns before conclusions, long-game over quick wins.

That answer didn't come from a single conversation. It came from how ChatGPT is built to remember. Here's what's actually happening under the hood.

The Five Memory Systems Behind That Answer

1. Working Memory - The Current Conversation

This is the context window. It holds what you typed just now, any follow-up instructions, the format you asked for.

Without it, ChatGPT would forget your question halfway through answering it.

What it held for this prompt:

The full character-matching instruction you just pasted
Your follow-up: "explain only the top 3"

Think of it as RAM - fast, temporary, cleared when the conversation ends.

2. Episodic Memory - Past Conversations as Experiences

This is where things get more interesting.

Episodic memory stores time-stamped experiences, not raw facts. Not "user likes AI" - but something closer to "over multiple sessions this user returned to agentic architecture problems, debugged the same LangGraph loop three times, and asked follow-up questions about AWS Bedrock quotas."

What it pulled for my result:

I spent two sessions building a multi-agent system and kept refining the orchestration logic
I asked about meditation practices three times across different months

Those repeated moments built a behavioral fingerprint. That's what got mapped to Vikram's patience and systems thinking.

3. Semantic Memory - Stable Facts About You

This is the factual layer. Not experiences - just things that are true about you right now.

It doesn't remember when you said something. It just knows it.

What it knows about me:

14 years in software engineering, currently working on GenAI
Practices yoga and has discussed spirituality on multiple occasions

These facts shift slowly. If you change jobs, the older entry gets replaced. If you mention a new interest enough times, it eventually writes over the old one.

4. Procedural Memory - How to Help You Specifically

This one is often confused with preferences. It's not preferences. It's learned interaction patterns.

It's not storing "user likes bullet points." It's storing something behaviorally observed: "when given long paragraphs, this user asks for a summary. When given code, this user asks for comments."

What it learned about me:

I always ask for architecture diagrams alongside explanations
I ask for comparisons rather than isolated descriptions

This is why two people can ask ChatGPT the same question and get differently formatted answers. The system has calibrated to each person's interaction style.

5. Retrieval - The Filter That Makes It Useful

This is not a memory type. It's the mechanism that makes the other four usable.

ChatGPT doesn't load every memory when you send a message. It retrieves only what's relevant to the current question.

When I asked "which Tamil movie character am I?" - it did not pull up:

My AWS CLI errors from three weeks ago
The ECS environment variable issue I debugged last month

It pulled personality signals, career direction, values, communication patterns.

That selective retrieval is why the answer feels focused rather than scattered.

6. Inference - What Gets Synthesized, Not Stored

Worth mentioning separately because people confuse it with memory.

Inference is not a memory system. It's a reasoning step that runs after retrieval.

ChatGPT was never told "Mani is a systems thinker." It inferred that from the pattern of questions I asked, the problems I returned to, and how I framed things over time.

That inference is what converted dozens of disconnected memories into a coherent personality profile - and eventually into a character ranking.

How It All Fits Together

Your current prompt
        │
        ▼
Working Memory (context window)
        │
        ▼
Retrieve relevant long-term memories
        │
        ├── Episodic (experiences)
        ├── Semantic (facts)
        └── Procedural (interaction style)
        │
        ▼
Inference (synthesize patterns)
        │
        ▼
Character ranking

The interesting part is not that it said "you're like Vikram."

The interesting part is the path it took to get there - and that the same path runs every time you talk to a memory-augmented AI system.

The Memory Architecture at a Glance

Memory Type	Human Analogy	What It Stores
Working Memory	Short-term memory	Current conversation context
Episodic Memory	Past experiences	Time-stamped conversations and events
Semantic Memory	General knowledge	Stable facts about you
Procedural Memory	Learned habits	Your interaction patterns and preferences
Retrieval	Remembering the right thing	Filters which memories are relevant
Inference (not a memory)	Drawing conclusions	Synthesizes patterns into a coherent picture

Try It Now

Pick your industry. Run the prompt. Share your character in the comments.

Then try to explain why the AI reached that conclusion. That's where the real learning is.

If you're an AI/ML engineer building memory-augmented systems, this mental model maps directly to RAG pipelines, long-term memory stores, and how retrieval affects response quality. The movie prompt is just a more memorable way to explain the architecture.

What If Your Employees Never Had to Know Which System to Check?

Manikandan Pandurangan — Wed, 24 Jun 2026 16:07:11 +0000

A practical look at building one AI desk that talks to your documents, your database, and the web. All at the same time.

The problem nobody talks about out loud

Someone on the operations team needs to know the incident response runbook for a specific system. They ask a colleague. That colleague isn't sure. They dig through Confluence, try a search, find something from 2022, hope it's still valid.

Meanwhile someone in data analytics wants yesterday's order count. They open a BI tool. Filter wrong. Give up. Ping the data team.

These are not technology failures. They're routing failures. The answers exist. Nobody knows where to look.

One Desk AI is a working attempt to fix that.

What it actually does

One question. One box. The system figures out where the answer is.

Ask it something about an internal process and it searches your uploaded documents using semantic (meaning-based) search. Ask it about data and it writes and runs a database query on your behalf. Ask it something general and it searches the web, reads the relevant pages, and gives you a summary.

You don't choose which mode. The system does.

The response comes back the same way every time: clean with any personal data removed and with a full trace of which agent ran, why it was chosen, and how long each step took.

The four agents behind the single answer

The system runs four specialized agents in sequence. Each one does exactly one job.

Knowledge Brain handles your internal documents. It uses vector search (think of it as search that understands meaning, not just keywords) over an OpenSearch index. If a question contains an organization name or mentions internal content, this agent runs.

SQL Agent handles data questions. It does not simply generate a query and run it. It generates the query, then has a second model verify it for safety before execution. This prevents the obvious disasters.

Research Agent handles everything else. It runs a Google search, reads the actual pages, and synthesizes a response. Not snippets. The full content.

Author Agent runs last on every response. It reformats the output for readability, strips any personally identifiable information, and is the only agent that writes to the user. One output point, always.

There is also a guardrail layer at the front. Before any agent runs, the input is checked for SQL injection attempts and similar attacks. Blocked inputs never reach the agents.

Why this matters for non-technical managers

The honest version: most AI tools companies buy are wrappers around a chat interface with one source of truth. Ask a question about a document, get an answer about the document. That's it.

This system connects three sources at once and decides between them per question. A new hire asking "what's our leave policy?" gets the HR document. A manager asking "how many leave requests were filed last quarter?" gets a live database query. No manual switching. No knowing in advance which system holds the answer.

The evaluation layer matters too. Every response is written to a database table: the original question, which agent handled it, the raw output, the final response, whether any PII was masked, and latency per stage. That's not for debugging. That's the audit trail compliance teams ask for and rarely get from AI tools.

Where it runs and what it costs to operate

The system deploys on AWS ECS Fargate which means there are no servers to manage. It starts a container when a request comes in and scales with demand. The AI model runs on AWS Bedrock (Claude 3.5 Sonnet) which means pay-per-use with no GPU procurement.

For a mid-sized company with moderate query volume the infrastructure cost is low. The bigger cost is the initial setup: getting documents into the search index, configuring which organization names the knowledge agent should recognize, and connecting the database.

Setup instructions are in the repository. The config that controls everything including which organizations the system knows about is a single YAML file. One edit, redeploy, done.

What this is not

It is not a replacement for a proper knowledge management system. If your internal documents are scattered, incomplete, or outdated, this system will surface scattered, incomplete, or outdated answers with confidence.

It does not handle ambiguous questions well. "How are we doing?" will confuse the routing layer. Specific questions get better answers.

It does not do anything about data quality in your database. If the data is wrong, the query results are wrong.

A note on the technical choices

The LangGraph version constraint is worth reading even if you never touch the code. The README documents which exact package versions work together and why upgrading them individually causes silent failures. That section alone is worth saving for anyone who has debugged a LangChain version mismatch at midnight.

The WebSocket implementation is also non-obvious. Long-running AI responses don't fit neatly into a standard HTTP request-response cycle. The system streams progress events back to the client so users see which agent is running while the answer is being assembled. The Angular integration contract is documented with working code.

Guardrails and PII masking are built to extend

This is a multi-agent system so the guardrail layer was designed as infrastructure rather than a fixed list of rules.

The input guardrail blocks SQL injection attempts before any agent runs. The Author Agent masks PII on every response regardless of which agent produced the answer. Both sit at fixed points in the graph so adding new checks means editing one file not hunting across four agents.

The config that controls routing also controls which patterns the guardrail flags. A team in a regulated industry can add domain-specific risk patterns in the same YAML that defines their organization names. No code change required for the common cases.

The masking rules follow the same pattern. New entity types can be added through config. The current defaults cover emails, phone numbers, and national ID formats for Indian regulatory context (aligned with DPDP Act requirements). Adding new formats is a config entry.

The design intent: guardrails in a multi-agent system should be easier to extend than a single-model wrapper because the insertion points are explicit and documented.

What's next

The current system handles text queries. The next logical step is audio: speak the question, get the answer read back. The architecture already supports it. Web search quality could also improve by adding domain filtering for trusted sources.

Read the full code

Everything described here is working code, not a demo or a mockup.

github.com/Manisoft55-lab/one-desk-ai

The README walks through local setup with Docker Compose to full ECS Fargate deployment. The test client lets you try all four agent paths with one command.

If you build something on top of it or run into a version issue the docs don't cover, open an issue.

Built at Manisoft Labs. Questions about the architecture or deployment: find me on LinkedIn.

Don't Let Your Jarvis Become Ultron: A Field Guide to Testing Agentic AI system

Manikandan Pandurangan — Tue, 23 Jun 2026 01:46:43 +0000

Stage 1: Component tests. Write deterministic unit tests for each layer: test_research_agent.py, test_web_search_tool.py, test_user_profile_memory.py. Use mock data that your domain expert has signed off on. These run on every commit, cost nothing, and catch the obvious breakages before any LLM call gets billed. While you're here, stub the external APIs too (GA4, Shopify, Meta, OpenSearch). If a test goes red because Shopify was down, it isn't telling you anything about your agent.

Stage 2: The prompt repository. Sit with the domain expert and collect the sharpest prompts you can, the ones that force specific tools, functions, agents, and memory to fire. Tag each prompt with what it's supposed to exercise, and group prompts by business area so a change in one area only re-runs its own set. This is the most valuable thing you'll build, so treat it that way.

Two kinds of prompts people forget. First, the failure cases: out-of-scope questions, prompt injection, ambiguous input, empty or malformed tool responses, timeouts. In banking, a prompt that checks whether the agent correctly refuses to give financial advice is a real test, not an edge case. Second, the multi-turn cases. Memory bugs show up across a conversation, not in a single call. Does it carry context forward, drop it when it should, and never leak one user's profile into another user's session? That last one matters a lot under DPDP.

Stage 3: Coverage and trajectory. Run the whole repository and confirm every agent and tool actually fired. That's the coverage check. Then go one level deeper and look at the path the agent took to get there. A tool firing isn't the same as the right tool firing, with the right arguments, in the right order, without three pointless detours, and recovering when a tool returned an error. This trajectory check is the part most teams skip, and it's the part that's specific to agents rather than plain LLM apps.

Stage 4: Versioned runs and capture. Stamp every run with a version (gpt-5.5-upgrade-20260623) and store every response against it. Now regression is something you can point to instead of something you argue about. Two additions. Run each prompt several times rather than once, because the model is stochastic and a single scored run is closer to a coin flip than a test. Track the pass rate and the variance. And capture cost, tokens, latency, and tool-call count on every run, because the upgrade decision is a trade-off. "Four percent more accurate at three times the tokens and twice the latency" is a business call, and you can't make it without those numbers in front of you.

Stage 5: Ground truth store. Keep domain-expert-verified ground truths for each prompt and tool, versioned the same way (...-20250510). One thing to decide early: who is allowed to change a ground truth, and how that approval gets recorded. When the product changes for real, old ground truths go stale, and without an update process the suite slowly starts failing things that are actually correct.

Stage 6: The evaluator. Score each candidate run against the ground truth using Ragas plus an LLM judge, on precision, recall, completeness, correctness, and whatever else the business asks for. The catch is that your judge is also an LLM, with its own biases toward longer answers, toward whatever comes first, toward its own style. Keep a small set of human-labelled examples and check how often the judge agrees with the humans. If you don't, you get metrics that are wrong and confident at the same time.

Stage 7: Dashboard and human review. Surface the low-scoring cases and let a human confirm or correct both the ground truth and the new response. The same screen does double duty: the labels people produce here are what you use to calibrate the judge in Stage 6.

Stage 8: CI/CD Decide where this runs. Component tests on every pull request, the full evaluation suite nightly and before a release, and a gate that blocks the deploy when scores fall below a threshold. A suite that nothing in your pipeline calls won't get run, and won't get maintained.