Langchain — Token limitation handling strategies

Chandan — Sun, 18 May 2025 17:40:40 +0000

When working with LangChain to handle large documents or complex queries, managing token limitations effectively is essential. Here are some strategies to ensure efficient and meaningful responses without exceeding token limits:

1. Context Management

Conversation History Trimming: Retain only the most relevant parts of the conversation or truncate earlier parts of a lengthy chat. This minimizes the token load while preserving essential context.
Memory Management: Use memory types like ConversationBufferWindowMemory to keep only recent interactions or critical points from a conversation. This approach helps maintain continuity without adding all previous tokens.

2. Chunking Techniques

Split long documents into smaller, manageable chunks of text (e.g., by paragraph or sentence). Each chunk is then processed independently, ensuring token limits aren’t breached in any single chunk.
Sliding Windows: Overlap text segments slightly to preserve context across chunks, helping maintain the flow of information.

3. Stuffing Technique

This method attempts to “stuff” as much context as possible within the token limit, particularly for shorter documents.
It’s a simple approach where all relevant chunks are concatenated and passed to the model in a single request.
useful only when document size is close to or just under the token limit.

from langchain import OpenAI
from langchain.chains import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader

# Step 1: Load your documents
# For demonstration, we will load a sample text file. Replace with your actual documents.
loader = TextLoader("path/to/your/document.txt")
documents = loader.load()

# Step 2: Initialize the language model
llm = OpenAI(model="gpt-3.5-turbo")  # Use your preferred model

# Step 3: Load the summarize chain with the "stuff" method
summarize_chain = load_summarize_chain(llm, chain_type="stuff")

# Step 4: Run the summarization chain
summary = summarize_chain.run(documents)

# Step 5: Print the final summary
print("Final Summary:")
print(summary)

4. Map-Reduce Strategy

Map Phase: Each chunk of the document is processed independently, generating responses or summaries.
Reduce Phase: The results from each chunk are aggregated or summarized to produce a cohesive answer.
This is particularly effective when generating summaries or extracting information from large documents, as it breaks down tasks into smaller pieces.

from langchain import OpenAI
from langchain.chains import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

# Step 1: Define your large document text
large_document = """
# Your long text or document content here.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua...
"""

# Step 2: Split the document into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_text(large_document)

# Step 3: Create Document objects for each chunk
documents = [Document(page_content=chunk) for chunk in chunks]

# Step 4: Initialize the LLM (e.g., OpenAI's GPT-3 or GPT-4)
llm = OpenAI(model="gpt-3.5-turbo")  # Adjust model as needed

# Step 5: Load the Map-Reduce summarize chain
summarize_chain = load_summarize_chain(llm, chain_type="map_reduce")

# Step 6: Run the chain on the documents and get the final summary
final_summary = summarize_chain.run(documents)

print("Final Summary:")
print(final_summary)

5. Refine Strategy

In this approach, a base response is generated from the first chunk of text.
For each subsequent chunk, the model refines the initial response by adding, modifying, or clarifying information based on the new content.
This iterative refinement helps produce a nuanced answer that evolves with additional context.

from langchain import OpenAI, Document
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Step 1: Initialize the large document
large_document = """
# Place your large document text here. For example:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus imperdiet, nulla et dictum interdum, nisi lorem egestas odio, vitae scelerisque enim ligula venenatis dolor...
"""

# Step 2: Split the document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_text(large_document)

# Step 3: Create Document objects for each chunk
documents = [Document(page_content=chunk) for chunk in chunks]

# Step 4: Initialize the LLM (e.g., OpenAI's GPT-3 or GPT-4)
llm = OpenAI(model="gpt-3.5-turbo")  # Replace with your desired LLM setup

# Step 5: Load the summarize chain with the refine option
refine_summarize_chain = load_summarize_chain(llm, chain_type="refine")

# Step 6: Run the chain on the documents
final_summary = refine_summarize_chain.run(documents)

print("Refined Summary:")
print(final_summary)

6. Condensed Summarization

Intermediate Summaries: For lengthy content, generate intermediate summaries for each section or chunk.
Combine these intermediate summaries to form a final summary that fits within the token limit.
This approach reduces the overall token load while retaining key information.

7. Keyword Extraction & Focused Retrieval

Instead of passing entire chunks, extract keywords or key phrases from each section and use these to construct a focused prompt.
This targeted approach minimizes token usage while ensuring relevant information is included in the model’s context.

8. Dynamic Chunking with Adaptive Summaries

Create dynamic, short summaries of longer chunks if token usage is approaching the limit. Adjust chunk size based on token availability to fit within constraints.
This strategy is particularly useful for handling unpredictable token lengths and helps ensure no information is omitted from the response.

9. Iterative Prompting

For tasks requiring complex answers, break down the query into smaller sub-queries. For instance, if the query is multifaceted, start with one aspect, generate a response, and iteratively build on it.
This segmented approach enables deeper exploration without crossing token boundaries.

Each of these strategies can be tailored depending on the document type, user requirements, and available resources. Combining multiple strategies can often yield optimal results, allowing LangChain to operate efficiently within token constraints.

DEV Community: Chandan