How to Feed 100K Words to AI (Without Breaking the Bank)

#ai #llm #api #programming

128K context sounds great — until your prompts cost $2 each. Here's how to optimize tokens and process massive documents for pennies.

You got access to 128K context. Excited, you paste your entire codebase. Then you check the bill.

100K tokens per request × 2.80/1M = 2.80/1M=0.28 per call. Not bad for one request. But 1000 calls? $280.

Here's how to process massive documents smarter.

1. Trim Before Sending

def trim_context(text, max_chars=4000):
    """Keep only what matters."""
    # Remove whitespace
    text = " ".join(text.split())
    # Truncate with a summary note
    if len(text) > max_chars:
        text = text[:max_chars] + "...[truncated]"
    return text

Savings: 60% fewer tokens on verbose documents.

2. Chunk + Summarize (RAG-Lite)

def chunk_and_summarize(text, chunk_size=2000):
    """Split large docs, summarize each chunk, then combine."""
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

    summaries = []
    for chunk in chunks:
        summary = client.chat.completions.create(
            model="deepseek-v4-flash",
            messages=[{"role": "user", "content": f"Summarize:\n{chunk}"}],
            max_tokens=100  # ← Cap response
        )
        summaries.append(summary.choices[0].message.content)

    # Combine summaries into final answer
    combined = " ".join(summaries)
    return client.chat.completions.create(
        model="deepseek-v4-pro",
        messages=[{"role": "user", "content": f"Based on: {combined}\n\nAnswer: {user_question}"}]
    )