DEV Community

gal
gal

Posted on

DeepSeek infinite context window

With models being so cheap and strong I came up with an idea to make a simple "Agent" that will refine the infinite context size to something manageable for LLM to answer from instead of using RAG. For very large contexts you could still use RAG + "infinite context" to keep the price at pay.

How it works?

  1. We take a long text and split it into chunks (like with any RAG solution)
  2. Until we have reduced text to model's context we repeat
    1. We classify each chunk as either relevant or irrelevant with the model
    2. We take only relevant chunks
  3. We feed the high-quality context to the final model for answering (like with any RAG solution)

Full code

import os
from openai import OpenAI
from flashlearn.skills.classification import ClassificationSkill


def super_simple_chunker(text, chunk_size=1000):
    """Chunks text into smaller segments of specified size."""
    chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
    return [{"text": chunk} for chunk in chunks]


def infinite_context(question, context):
    """
    Uses a language model to provide answers to a given question
    by processing a large context into relevant chunks.
    """
    # Step 1: Setup your provider

    # OpenAI setup - Uncomment if using OpenAI directly
    # model_name = "gpt-4o-mini"
    # client = OpenAI()
    # os.environ["OPENAI_API_KEY"] = 'Your api key'

    # DeepSeek setup
    model_name = 'deepseek-chat'
    client = OpenAI(
        api_key='YOUR DEEPSEEK API KEY',
        base_url="https://api.deepseek.com",
    )

    # Step 2: Chunk context like you would with any RAG flow
    chunks = super_simple_chunker(context)

    # Step 3: Initialize the Classification Skill - Classify relevant content
    skill = ClassificationSkill(
        model_name=model_name,
        client=client,
        categories=["relevant", "somehow_relevant", "irrelevant"],
        max_categories=1,
        system_prompt="Classify content based on relevancy to the task" f"{question}",
    )

    # Step 4: Prepare classification tasks (passing the list of dicts + columns to read)
    iterations = 0
    while len(chunks) > 64 or iterations > 2:
        tasks = skill.create_tasks(chunks)

        # Narrow down on quality context
        results = skill.run_tasks_in_parallel(tasks)
        print(results)

        # Step 6: Map results and reiterate if still too many chunks
        for i, review in enumerate(chunks):
            chunks[i]['category'] = results[str(i)]['categories']
        chunks = [{'text': review['text']} for review in chunks if review['category'] == 'relevant']

        iterations += 1

    # Answer
    answer = client.chat.completions.create(
        model=model_name,
        messages=[
            {'role': 'user', 'content': str(chunks)},
            {'role': 'user', 'content': question}
        ]
    )

    return answer.choices[0].message.content


if __name__ == "__main__":
    # Open the long context file
    file_path = 'context.txt'
    file = open(file_path, 'r', encoding='utf-8')

    # Read the file's contents
    file_contents = file.read()

    answer = infinite_context('YOUR QUESTION', file_contents)
    print(answer)
Enter fullscreen mode Exit fullscreen mode

Comparison

Usually context reduction is done via RAG - embeddings, but with the rise of reasoning models, we can perform a lot better and more detailed search by directly using models capabilities.

Full code Github link: Click

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →