Harshith Halejolad

Posted on Apr 12

Plug-and-Play Context and Memory Layer for Any LLM API — CRISP

#ai #llm #python #showdev

I recently built CRISP (Compressed Retrieval and Intelligent Semantic Processing), a Python library that acts as a lightweight, modular wrapper around any LLM API. It allows you to process massive conversation histories and complex datasets (like meeting transcripts) while only sending the most semantically dense context to your API provider.

The goal is simple: maximize the information-per-token ratio in LLM pipelines.

The Problem

Prompts often contain context bloat—redundant, repeated, low-signal information. In large-scale use cases involving tens to hundreds of thousands of tokens, this leads to higher cost, latency, and noise.

Optimizing this typically involves advanced, time-consuming solutions such as:

complex RAG pipelines
custom embedding + vector database setups
multi-stage preprocessing systems

But these come with tradeoffs:

heavy setup and infrastructure overhead
harder integration into existing workflows
not truly plug-and-play for most developers

As a result, developers either:

over-engineer their stack
or settle for inefficient prompt stuffing

There’s a clear gap for something that is:

significantly more efficient than naive prompting
but still simple, lightweight, and plug-and-play

This is the gap that CRISP is designed to fill.

Core Features of CRISP

Zero manual setup: Automated dependency management—no local LLM runtimes required
Deterministic processing: Uses TextRank for semantic extraction, ensuring consistent results
Massive compression: Typically achieves 95–99% reduction from raw history to final prompt
Privacy-first design: Local-first architecture with plain text storage and an isolated vector database
API agnostic: Generates high-density RAG prompts optimized for any provider (Groq, OpenAI, Anthropic, Gemini, etc.)

Architecture Summary

CRISP is built as a modular pipeline of deterministic components:

Semantic memory: Summarizes each turn to <10% before logging to a .txt file
Embedder: Uses all-MiniLM-L6-v2 for high-speed 384-dimensional semantic matching
Retriever: Manages persistent ChromaDB collections per wrapper instance
Compressor: A multi-stage engine performing TextRank extraction, redundancy filtering, filler-word removal, and rule-based stripping

Real-World Use Case

The primary test case involved processing a messy, repetitive 30,000+ character meeting transcript with 20 speakers. The results were as follows:

Input history: ~30,672 characters (filled with "uh", "um", "so basically", etc.)
CRISP processing: semantic retrieval of pricing facts, deduplication, and stripping
Output context: 422 characters
Net reduction: 98.6%
Final LLM response: precise extraction of the 3 pricing tiers and discount decisions with zero hallucinations

A full demo of this use case is available in the CRISP repository:
https://github.com/Antiproton2023/crisp

Installation

Since CRISP is currently available as a local repository, you can clone it and install the dependencies:

# Clone the repository
git clone https://github.com/Antiproton2023/crisp.git
cd crisp

# Install required dependencies
pip install -r requirements.txt

Alternatively, CRISP will auto-install core dependencies like chromadb and summa on first import.

Each part of the pipeline is modular, so you can customize it to fit your needs. If you build something useful, consider contributing to the project on GitHub.

DEV Community