DEV Community

Daniel Dong
Daniel Dong

Posted on

How to Feed 100K Words to AI (Without Breaking the Bank)

128K context sounds great — until your prompts cost $2 each. Here's how to optimize tokens and process massive documents for pennies.

You got access to 128K context. Excited, you paste your entire codebase. Then you check the bill.

100K tokens per request × 2.80/1M = 2.80/1M=0.28 per call. Not bad for one request. But 1000 calls? $280.

Here's how to process massive documents smarter.

1. Trim Before Sending

def trim_context(text, max_chars=4000):
    """Keep only what matters."""
    # Remove whitespace
    text = " ".join(text.split())
    # Truncate with a summary note
    if len(text) > max_chars:
        text = text[:max_chars] + "...[truncated]"
    return text
Enter fullscreen mode Exit fullscreen mode

Savings: 60% fewer tokens on verbose documents.

2. Chunk + Summarize (RAG-Lite)

def chunk_and_summarize(text, chunk_size=2000):
    """Split large docs, summarize each chunk, then combine."""
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

    summaries = []
    for chunk in chunks:
        summary = client.chat.completions.create(
            model="deepseek-v4-flash",
            messages=[{"role": "user", "content": f"Summarize:\n{chunk}"}],
            max_tokens=100  # ← Cap response
        )
        summaries.append(summary.choices[0].message.content)

    # Combine summaries into final answer
    combined = " ".join(summaries)
    return client.chat.completions.create(
        model="deepseek-v4-pro",
        messages=[{"role": "user", "content": f"Based on: {combined}\n\nAnswer: {user_question}"}]
    )
Enter fullscreen mode Exit fullscreen mode

3. Use Cheaper Models for Preprocessing

Task Model Cost
Summarize chunks glm-4-flash $0.10/1M
Final answer deepseek-v4-pro $1.40/1M

Strategy: Flash model for the heavy lifting, Pro model only for the final polish.

The Math

Approach 100K doc × 1000 calls Cost
Naive (pro model, full text) 100M tokens $280
Trimmed + flash preprocess 20M + 10M tokens $32
Savings 89%

Try it free: aibridge-api.com — 14 models, one API.

1

2

3

4

Top comments (0)