Nilofer 🚀

Posted on May 5

ContextCraft: A Visual Workbench for Building and Managing LLM Context Windows

#machinelearning #promptengineering #llm #opensource

Building a good LLM prompt is not a one-shot task. You assemble pieces, a system message, a few examples, some context, the actual instruction and then you iterate. You compress things that are too long, test whether the output still holds up, check how many tokens you are spending, and save versions so you can roll back when something breaks.

Most developers do this in a text editor, a notebook, or scattered across a handful of scripts. There is no single place where you can see the whole context window, manipulate it visually, compress a block, run a live test, and save a snapshot, all without switching tools.

ContextCraft is that place. It is a canvas-based interactive workbench for assembling, compressing, testing, and versioning LLM context windows. It runs locally, connects to Ollama for local compression and testing, supports OpenRouter for cloud LLM testing, and exports directly to OpenAI, Anthropic, LangChain, and JSON formats.

Features

Visual Canvas: Drag and drop interface for organizing prompt blocks with real-time token counting and visual progress bars.

Smart Compression: AI-powered compression using Ollama with semantic preservation. Set a target compression ratio, choose whether to preserve structure, and review a before/after comparison before applying.

Coverage Analysis: Semantic similarity scoring between original and compressed content. Key concept preservation is surfaced as a score so you know exactly what you are trading for token savings.

LLM Testing: Test prompts with streaming responses from Ollama or OpenRouter directly from the canvas. Select provider, model, and temperature and view responses in real time.

Version Control: Save and restore canvas versions with a SQLite backend. Name versions for easy reference and compare two versions to see what changed.

Multi-Format Export: Export to OpenAI, Anthropic, LangChain, and JSON formats. Copy the generated code and paste directly into your application.

Block Library: Pre-built starter blocks for common use cases, available from the sidebar. Add your own blocks to the library for reuse across canvases.

Architecture

ContextCraft is split into a FastAPI backend and a React + Vite frontend.

The backend handles token counting via tiktoken, semantic similarity analysis via sentence-transformers, compression via Ollama, streaming LLM test responses via Ollama or OpenRouter, SQLite-backed version management, and export format generation. The frontend renders the visual canvas with drag-and-drop via @hello-pangea/dnd and code editing via CodeMirror.

contextcraft/
├── server/                 # FastAPI backend
│   ├── main.py            # FastAPI app entry point
│   ├── models.py          # Pydantic data models
│   ├── tokenizer.py       # Token counting (tiktoken)
│   ├── coverage.py        # Semantic similarity analysis
│   ├── compress.py        # Ollama compression service
│   ├── tester.py          # LLM streaming test service
│   ├── export.py          # Export format generators
│   ├── versions.py        # SQLite version management
│   └── pricing.py         # OpenRouter pricing API
├── frontend/              # React + Vite frontend
│   ├── src/
│   │   ├── components/    # React components
│   │   ├── hooks/         # Custom React hooks
│   │   └── App.jsx        # Main application
│   └── package.json
├── cli/                   # CLI entry point
│   └── main.py
└── pyproject.toml         # Python package config

Getting Started

Prerequisites

Python 3.9+
Node.js 18+
Ollama (optional, for local compression and testing)
OpenRouter API key (optional, for cloud LLM testing)
Installation

Installation

# Clone the repository
git clone https://github.com/contextcraft/contextcraft.git
cd contextcraft

# Install Python dependencies
pip install -e ".[dev]"

# Install frontend dependencies
cd frontend
npm install
cd ..

# Initialize the database
contextcraft init-db

Running the Application

# Start the server and frontend
contextcraft serve

# Or start with custom options
contextcraft serve --host 0.0.0.0 --port 8000 --frontend-port 3000

Once running, the frontend is available at localhost:5173 and the API docs at localhost:8000/docs.

Configuration

Create a .env file in the project root:

# OpenRouter API key (for cloud LLM testing)
OPENROUTER_API_KEY=your_api_key_here

# Ollama URL (default: http://localhost:11434)
OLLAMA_URL=http://localhost:11434

# Default compression model
DEFAULT_COMPRESSION_MODEL=gemma2:2b

Supported Models

Token Counting: GPT-4, GPT-4o, GPT-4o Mini, GPT-3.5 Turbo, Claude 3 Opus, Sonnet, Haiku, Claude 3.5 Sonnet

Compression (via Ollama): gemma2:2b (default), any Ollama-compatible model

Testing: Ollama local models, OpenRouter cloud models (requires API key)

Usage Guide

Creating a Canvas: Start with an empty canvas or load from the library. Add blocks using the sidebar buttons or drag from the library. Arrange blocks by dragging to reorder. Edit block content inline or in the full editor.

Compressing Content: Click the compress icon on any block. Set the target compression ratio (0.1 to 0.9). Choose whether to preserve structure. Review the before/after comparison. Apply compression when satisfied.

Testing Prompts: Add your prompt blocks to the canvas. Click the Test button. Select provider (Ollama or OpenRouter). Choose model and set temperature. View streaming responses in real time.

Analyzing Coverage: Compress one or more blocks. Click the Coverage button. View semantic similarity scores. Check key concept preservation.

Managing Versions: Click Versions to save the current state. Name your version for easy reference. Restore previous versions at any time. Compare versions to see changes.

Exporting: Click Export when ready. Choose format, OpenAI, Anthropic, LangChain, or JSON. Copy the generated code. Paste into your application.

API Endpoints

Token Management

POST /api/tokenize - count tokens for text or blocks
GET /api/pricing - get model pricing information

Compression

POST /api/compress - compress text using Ollama
POST /api/coverage - analyze semantic coverage between original and compressed

Testing

POST /api/test - stream LLM responses from Ollama or OpenRouter

Versioning

GET /api/versions - list all versions
POST /api/versions - save a new version
GET /api/versions/{id} - get a specific version
POST /api/versions/{id}/restore - restore a version
POST /api/versions/compare - compare two versions

Export

POST /api/export - export canvas to OpenAI, Anthropic, LangChain, or JSON format

Library

GET /api/library - get starter block library
POST /api/library - add a block to the library

CLI Commands

# Start the application
contextcraft serve

# Initialize database
contextcraft init-db

# Add a block to library
contextcraft add-block --type system --label "My Template" --content "..."

# Get help
contextcraft --help

Docker

# Build image
docker build -t contextcraft .

# Run container
docker run -p 8000:8000 -p 5173:5173 contextcraft

Development

Backend

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black server/ cli/
isort server/ cli/

# Type checking
mypy server/ cli/

Frontend

cd frontend

# Start dev server
npm run dev

# Build for production
npm run build

# Run linter
npm run lint

Contributing

Fork the repository. Create a feature branch:

git checkout -b feature/amazing-feature

Commit your changes:

git commit -m 'Add amazing feature'

Push to the branch:

git push origin feature/amazing-feature

Then open a Pull Request.

How I Built This Using NEO

This tool was designed, built, debugged, and iterated entirely using NEO -An autonomous AI engineering agent that writes, runs, and refines real code end-to-end.

ContextCraft is a full-stack application, a FastAPI backend, a React + Vite frontend, a CLI, and a SQLite-backed versioning layer. Every part of the system was generated and connected through NEO: the backend services for token counting, semantic coverage analysis, compression via Ollama, streaming LLM testing, export pipelines, version management, and pricing integration, along with the interactive frontend canvas for assembling prompt blocks with drag-and-drop, inline editing, and real-time token tracking.

The compression and coverage pipeline, the live testing flow across Ollama and OpenRouter, the version save/restore and comparison system, and the multi-format export layer were all built end-to-end from a high-level problem description. NEO handled the full cycle - generating code, wiring components, resolving issues, and refining the system into a working product.

How You Can Use and Extend This With NEO

Prompt engineering workbench: Instead of iterating on prompts in a text editor and manually counting tokens, assemble your context window visually, compress blocks that are too long, and test the result, all in one place. The version control means you never lose a working configuration while experimenting.

Evaluate compression quality before shipping: Before deploying a compressed prompt to production, run coverage analysis to get a semantic similarity score between the original and compressed version. You know exactly how much meaning you are trading for token savings, not just a token count but an actual semantic measurement.

Manage prompt libraries across projects: The block library lets you save reusable prompt blocks and load them into any canvas. Teams building multiple LLM products can maintain a shared library of tested, versioned prompt components.

Extend it with additional export formats: The export module currently supports OpenAI, Anthropic, LangChain, and JSON. Adding a new format follows the same pattern in export.py and surfaces automatically in the Export UI without touching any other part of the stack.

Final Notes

Context window management is one of those problems that looks simple until you are doing it seriously. ContextCraft brings together the pieces that are usually scattered across different tools visual assembly, token counting, AI compression, semantic coverage analysis, live testing, version control, and export into a single local workbench.

The code is at https://github.com/dakshjain-1616/ContextCraft
You can also build with NEO in your IDE using the VS Code extension or Cursor.
You can also use NEO MCP with Claude Code: https://heyneo.com/claude-code

DEV Community