DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Stateful Rolling Chat History: Manage Conversation Memory Without Blowing the Context Window

Your agent handles multi-turn conversations. After turn 1, you have 2 messages. After turn 10, you have 20. After turn 50, you have 100 messages and you are approaching the context limit. You start dropping old messages. You drop too many and the conversation loses coherence. You drop too few and you hit the limit anyway.

Context management for multi-turn conversations is a calibration problem. llm-context-rotate is a stateful rolling history that handles the calibration for you.


The Shape of the Fix

from llm_context_rotate import ContextRotate

ctx = ContextRotate(
    max_turns=20,      # Keep last 20 turns (40 messages)
    keep_first=2,      # Always keep first 2 messages (system-level context)
    preserve_pairs=True,  # Never split tool_use/tool_result
)

# Add messages as the conversation progresses
ctx.add_user("What was the revenue in Q3?")
ctx.add_assistant("Q3 revenue was $2.4M, up 18%.")
ctx.add_user("And Q4?")
ctx.add_assistant("Q4 results haven't been released yet.")

# Get the current window for the next LLM call
messages = ctx.messages()

response = anthropic_client.messages.create(
    model="claude-sonnet-4-6",
    messages=messages,
    max_tokens=1024,
)
Enter fullscreen mode Exit fullscreen mode

messages() always returns a valid, trimmed list. You never think about window management — you just add messages and call messages().


What It Does NOT Do

llm-context-rotate does not summarize dropped messages. When old turns are rotated out, they are gone. No summarization pass. If you need semantic continuity across a very long conversation, you need a summarization strategy on top of the rotation.

It does not count tokens. It trims by turn count. For token-precise trimming, use prompt-token-counter to estimate token usage and gate on that estimate. Turn count is a fast proxy; token count is exact.

It does not persist conversation history across process restarts. The history is in-memory. For persistence, serialize ctx.messages() to conversation-codec and reload on next session.


Inside the Library

A "turn" is one user/assistant exchange (2 messages). The rolling window keeps the last N turns:

from collections import deque

class ContextRotate:
    def __init__(
        self,
        max_turns: int = 10,
        keep_first: int = 0,
        preserve_pairs: bool = True,
    ):
        self._max_turns = max_turns
        self._keep_first = keep_first
        self._preserve_pairs = preserve_pairs
        self._history: deque[dict] = deque()
        self._pinned: list[dict] = []  # First N messages, never rotated

    def add_user(self, content: str | list) -> None:
        msg = {"role": "user", "content": content if isinstance(content, list) else [{"type": "text", "text": content}]}
        self._add(msg)

    def add_assistant(self, content: str | list) -> None:
        msg = {"role": "assistant", "content": content if isinstance(content, list) else [{"type": "text", "text": content}]}
        self._add(msg)

    def _add(self, msg: dict) -> None:
        if len(self._pinned) < self._keep_first:
            self._pinned.append(msg)
            return

        self._history.append(msg)
        self._trim()

    def _trim(self) -> None:
        # Max messages in rolling window = max_turns * 2
        max_messages = self._max_turns * 2
        while len(self._history) > max_messages:
            if self._preserve_pairs:
                # Drop the oldest pair (user + next assistant, or assistant + next user)
                if len(self._history) >= 2:
                    self._history.popleft()
                    self._history.popleft()
                else:
                    self._history.popleft()
            else:
                self._history.popleft()

    def messages(self) -> list[dict]:
        return self._pinned + list(self._history)

    def turn_count(self) -> int:
        return len(self._history) // 2

    def clear(self) -> None:
        self._history.clear()
        self._pinned.clear()
Enter fullscreen mode Exit fullscreen mode

The keep_first parameter is for messages you always want in context: an initial system-level instruction set as a user message, or a background document injected at the start of the conversation. These are pinned and never rotated out.


When to Use It

Use it for conversational agents that handle multi-turn user interactions. Customer support bots, interview assistants, tutoring agents — any agent that must maintain coherence across many turns while staying within context limits.

Use it when you want "just works" context management. The deque-based rotation is predictable: the agent always has the most recent N turns of context. No complex logic, no edge cases with malformed message lists.

Use it alongside agent-message-sanitize to fix any structural issues before the messages reach the LLM. Rotation can occasionally produce a window that starts with an assistant message (if the last user message was just rotated out). Sanitize ensures the window always starts with a user message.

Skip it for single-turn agents. If your agent handles one request and terminates, there is no conversation to rotate.


Install

pip install git+https://github.com/MukundaKatta/llm-context-rotate

# Or from PyPI
pip install llm-context-rotate
Enter fullscreen mode Exit fullscreen mode
from llm_context_rotate import ContextRotate

# Session-scoped context (create one per user session)
ctx = ContextRotate(max_turns=15, keep_first=1, preserve_pairs=True)

# Initialize with a context document (pinned)
ctx.add_user(f"Here is the customer's account information:\n\n{account_summary}")

async def handle_message(user_message: str) -> str:
    ctx.add_user(user_message)

    response = await anthropic_client.messages.create(
        model="claude-sonnet-4-6",
        system="You are a helpful customer support agent.",
        messages=ctx.messages(),
        max_tokens=512,
    )

    response_text = response.content[0].text
    ctx.add_assistant(response_text)

    return response_text
Enter fullscreen mode Exit fullscreen mode

Sibling Libraries

Library What it solves
agent-message-window Sliding window for agent loops (pair-preserving trim)
agent-message-sanitize Fix structural issues in message lists
conversation-codec Persist conversation history to JSONL
prompt-token-counter Token counting for context budget tracking
agent-context-builder Section-based system prompt assembly

The context management stack: llm-context-rotate for rolling conversation history, agent-message-window for agent loops, conversation-codec for persistence, prompt-token-counter for budget tracking.


What's Next

Turn-level summarization: an optional summarize_fn callback that receives rotated-out turns and returns a summary. The summary is injected as a special user message with a "conversation_summary" type marker, preserving semantic context across the rotation boundary.

Importance scoring: a score_fn callback that scores each message for importance. Instead of always dropping the oldest messages, drop the lowest-scoring ones. Useful when conversations interleave critical instructions with routine exchanges.

Snapshot export: ctx.snapshot() that returns the full history including rotated messages, timestamped. Useful for building a long-term memory layer on top of the rotation.


Built as part of the agent-stack family: composable Python primitives for production LLM agents.

Top comments (0)