DEV Community

Nguyen Hien
Nguyen Hien

Posted on • Edited on

ChatGPT Creates a New MCP Session for Every Tool Call. Claude Doesn't.

I caught something weird.

I'm building mcpr, an open-source proxy for MCP servers. The cloud dashboard tracks every MCP request at the protocol level, including session lifecycle. Initialize calls, tool invocations, session IDs, latencies. Everything.

While monitoring a production MCP server that serves both ChatGPT and Claude simultaneously, I noticed a pattern that made me do a double-take:

ChatGPT: 2 tool calls. 2 separate sessions.
Claude: 2 tool calls. 1 session.

Same server. Same tools. Same protocol. Completely different behavior.

Let me show you.

The raw data

Here's exactly what the dashboard recorded for a simple interaction where the AI calls two tools back-to-back. You can reproduce this with any MCP server you have running: point both ChatGPT and Claude at it and watch the session IDs.

ChatGPT — one session per tool call

Session 1:
  04:02:40 PM  initialize              3ms   ok
  04:02:40 PM  tools/call  create_matching_question  12ms  ok
  -- session ended --

Session 2:
  04:03:47 PM  initialize              3ms   ok
  04:03:47 PM  tools/call  submit_answer            12ms  ok
  -- session ended --
Enter fullscreen mode Exit fullscreen mode

Two sessions. Two full initialize handshakes. Each session lives for roughly one second — just long enough to shake hands, call a tool, and disappear.

ChatGPT creates two separate MCP sessions for two tool calls

Claude — one session, many calls

Session 1:
  02:03:06 PM  initialize              30ms  ok
  02:03:07 PM  tools/list              11ms  ok
  02:03:08 PM  resources/list          14ms  ok
  02:03:38 PM  resources/read           4ms  ok
  02:03:41 PM  tools/call  create_cloze_question    35ms  ok
  02:03:45 PM  tools/call  get_latest_answer         6ms  ok
  -- session ended --
Enter fullscreen mode Exit fullscreen mode

One session. One initialize. Claude even runs discovery: tools/list, resources/list, and resources/read, before making any tool calls. All within the same session. Total duration: 39 seconds.

Claude reuses a single MCP session across multiple tool calls

Why this matters more than you think

1. Initialize is not free

Every MCP initialize is a full handshake. The client sends its capabilities, the server responds with its own, they negotiate a protocol version. Some servers also load config, set up database connections, or warm caches during init.

ChatGPT pays this cost on every single tool call. Claude pays it once.

A conversation that triggers 10 tool calls means 10 handshakes on ChatGPT vs 1 on Claude. If your initialize takes 30-50ms — which is modest — you're adding 300-500ms of pure overhead that your users feel but can't explain.

2. Your in-memory state is gone

This is the sneaky one. The silent killer.

If your MCP server stores anything in memory per session — user context, conversation history, cached API responses, computed state — ChatGPT will destroy it between tool calls.

# This pattern works perfectly on Claude.
# On ChatGPT, it's a landmine.

session_cache = {}

async def handle_initialize(session_id):
    session_cache[session_id] = {"user": None, "history": []}

async def handle_tool_call(session_id, tool, args):
    # On Claude: same session_id, cache hit, everything works
    # On ChatGPT: NEW session_id, cache miss, data is gone
    cache = session_cache.get(session_id)  # None on ChatGPT!
Enter fullscreen mode Exit fullscreen mode

You test on Claude. Everything works. State persists across tool calls. You ship it. Then ChatGPT users start reporting bugs — results missing context, follow-up calls returning empty data, conversations that seem to "forget" what just happened.

The worst part? Your server logs show zero errors. Every individual request succeeds. The failure is between requests, in the gap where your state quietly vanishes.

3. Tool discovery follows different paths

Look at the session data again. Claude calls tools/list and resources/list during the session. It discovers what's available, reads resources, then acts on what it learned.

ChatGPT skips all of this. It goes straight to initialize then tools/call. No discovery phase. This suggests ChatGPT caches the tool schema externally and doesn't need to rediscover it per session. This makes sense given the disposable session model.

This is actually clever engineering on ChatGPT's side: if you're going to throw away the session anyway, why waste time discovering what you already know?

How to build MCP servers that survive both models

The rule is simple: design for the worst case.

Make initialize blazing fast

ChatGPT will call it constantly. Every millisecond in init multiplies across every tool call in a conversation.

# Bad: heavy init that ChatGPT will pay for on every tool call
async def handle_initialize(session_id):
    await load_database_schema()      # 200ms
    await warm_embedding_cache()       # 500ms
    await fetch_user_preferences()     # 100ms
    # Total: 800ms per tool call on ChatGPT

# Good: return immediately, defer everything
async def handle_initialize(session_id):
    return {"capabilities": {...}}     # < 5ms
Enter fullscreen mode Exit fullscreen mode

Go stateless or go home

Don't rely on session-scoped state. Period. Use external persistence keyed on something stable: user ID, API key, anything that survives a session reset.

# Fragile: dies on ChatGPT
session_state = {}

# Robust: works everywhere
async def get_state(user_id):
    return await redis.get(f"user:{user_id}")
Enter fullscreen mode Exit fullscreen mode

How I spotted this

A regular HTTP reverse proxy — nginx, HAProxy, Caddy — would see these as normal HTTP requests. It has no idea that two POST requests belong to different MCP sessions, or that initialize was called twice instead of once.

mcpr is different. It parses MCP JSON-RPC at the protocol level. It knows what initialize means, tracks session IDs, groups tool calls by session, and measures per-method latency. That's how a pattern like this surfaces in the dashboard instead of hiding in raw HTTP logs.

If you're running MCP servers in production, this kind of protocol-level visibility is the difference between guessing why things are slow and knowing.

The takeaway

ChatGPT and Claude have fundamentally different MCP session models:

ChatGPT Claude
Session lifetime One tool call Entire conversation turn
Initialize calls Once per tool call Once per session
In-memory state Lost between calls Persists within session
Tool discovery Skipped (cached externally) Done within session

Design for the disposable model. If your server works on ChatGPT's session-per-call approach, it'll work everywhere. The reverse is not true.


mcpr is an Observability-first proxy for MCP servers (Apache 2.0).

Top comments (5)

Collapse
 
renato_marinho profile image
Renato Marinho

The session lifecycle discrepancy you've documented is important data — 2 tool calls, 2 sessions on ChatGPT vs. 1 session on Claude is a meaningful architectural difference with direct latency implications. The cold start overhead per tool call compounds fast in multi-step agent workflows.

Beyond the performance angle, the session model also has governance implications. If ChatGPT creates a new session per tool call, you get implicit session boundary isolation but lose continuity-based audit. If Claude maintains a persistent session, you have better context continuity but a single audit surface needs to capture the full session. Neither model today gives you cryptographically immutable per-call audit trails regardless of client behavior.

This is one of the problems Vinkius (vinkius.com) addresses at the infrastructure layer. Because it runs MCP servers inside V8 Isolate sandboxes, each tool call generates an independent SHA-256 audit record regardless of whether it arrives from a single session (Claude) or many sessions (ChatGPT). The governance is call-level, not session-level. The SDK is Vurb.ts. The mcpr proxy approach you're building is observability; Vinkius adds immutable audit on top.

Really valuable empirical contribution — protocol-level observability data like this shapes how the ecosystem understands the actual behavior differences between clients, not just the spec differences.

Collapse
 
nvhung150196 profile image
nvhung150196

That is amazing so that we know how to deal with chatGpt effectively! Good article.

Collapse
 
huongle2810 profile image
Huong Le

Such a great read! This post was super eye-opening when it comes to debugging and monitoring MCP server actions. 👏

Collapse
 
rocky5 profile image
Rocky

good

Some comments may only be visible to logged-in visitors. Sign in to view all comments.