DEV Community

Ank
Ank

Posted on

What are Chunks?

Chunking is the process of ingesting text documents and breaking large documents into smaller, manageable pieces that can be processed individually. This step is necessary because language models have token limits—they can only process a limited amount of text at once. When someone asks a question, your RAG system retrieves relevant chunks and includes them in the prompt sent to the language model. If your chunks are too large, you'll exceed the model's token limit and won't be able to include all the relevant information.

Language models work with tokens—basic units of text that can be words, parts of words, or punctuation. Different models have different token limits: some handle 4,000 tokens, others can process 128,000 tokens or more. The token limit includes everything in your prompt: the user's question, the retrieved chunks, and any instructions for the model.

Without proper chunking, you face two main problems, exceeding token limits or reduced precision. Large documents might exceed token limits the model can process, causing errors or truncation. Even if a document contains the right answer, if it's buried in lots of unrelated text, the model might struggle to find and use it effectively, reducing precision.

You can chunk your data using two main strategies:

Context-aware chunking: Divide documents based on their natural structure, such as sentences, paragraphs, or sections. This preserves the logical flow of information but creates variable-sized chunks. You can also include metadata like titles or section headers to provide more context.

Fixed-size chunking: Divide documents into chunks of a predetermined size (for example, 500 tokens each). This approach is simple and computationally efficient, but might split content at awkward places.

Top comments (0)