DEV Community

Cover image for Plug-and-Play Context Compression for Any LLM API — CRISP
Harshith Halejolad
Harshith Halejolad

Posted on

Plug-and-Play Context Compression for Any LLM API — CRISP

I recently built CRISP (Compressed Retrieval and Intelligent Semantic Processing), a Python library that acts as a lightweight, modular wrapper around any LLM API. It allows you to process massive conversation histories and complex datasets (like meeting transcripts) while only sending the most semantically dense context to your API provider.

The goal is simple: maximize the information-per-token ratio in LLM pipelines.


The Problem

Prompts often contain context bloat—redundant, repeated, low-signal information. In large-scale use cases involving tens to hundreds of thousands of tokens, this leads to higher cost, latency, and noise.

Optimizing this typically involves advanced, time-consuming solutions such as:

  • complex RAG pipelines
  • custom embedding + vector database setups
  • multi-stage preprocessing systems

But these come with tradeoffs:

  • heavy setup and infrastructure overhead
  • harder integration into existing workflows
  • not truly plug-and-play for most developers

As a result, developers either:

  • over-engineer their stack
  • or settle for inefficient prompt stuffing

There’s a clear gap for something that is:

  • significantly more efficient than naive prompting
  • but still simple, lightweight, and plug-and-play

This is the gap that CRISP is designed to fill.


Core Features of CRISP

  • Zero manual setup: Automated dependency management—no local LLM runtimes required
  • Deterministic processing: Uses TextRank for semantic extraction, ensuring consistent results
  • Massive compression: Typically achieves 95–99% reduction from raw history to final prompt
  • Privacy-first design: Local-first architecture with plain text storage and an isolated vector database
  • API agnostic: Generates high-density RAG prompts optimized for any provider (Groq, OpenAI, Anthropic, Gemini, etc.)

Architecture Summary

CRISP is built as a modular pipeline of deterministic components:

  • Semantic memory: Summarizes each turn to <10% before logging to a .txt file
  • Embedder: Uses all-MiniLM-L6-v2 for high-speed 384-dimensional semantic matching
  • Retriever: Manages persistent ChromaDB collections per wrapper instance
  • Compressor: A multi-stage engine performing TextRank extraction, redundancy filtering, filler-word removal, and rule-based stripping

Real-World Use Case

The primary test case involved processing a messy, repetitive 30,000+ character meeting transcript with 20 speakers. The results were as follows:

  • Input history: ~30,672 characters (filled with "uh", "um", "so basically", etc.)
  • CRISP processing: semantic retrieval of pricing facts, deduplication, and stripping
  • Output context: 422 characters
  • Net reduction: 98.6%
  • Final LLM response: precise extraction of the 3 pricing tiers and discount decisions with zero hallucinations

A full demo of this use case is available in the CRISP repository:
https://github.com/Antiproton2023/crisp


Installation

Since CRISP is currently available as a local repository, you can clone it and install the dependencies:

# Clone the repository
git clone https://github.com/Antiproton2023/crisp.git
cd crisp

# Install required dependencies
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Alternatively, CRISP will auto-install core dependencies like chromadb and summa on first import.

Each part of the pipeline is modular, so you can customize it to fit your needs. If you build something useful, consider contributing to the project on GitHub.

Top comments (0)