Building a good LLM prompt is not a one-shot task. You assemble pieces, a system message, a few examples, some context, the actual instruction and then you iterate. You compress things that are too long, test whether the output still holds up, check how many tokens you are spending, and save versions so you can roll back when something breaks.
Most developers do this in a text editor, a notebook, or scattered across a handful of scripts. There is no single place where you can see the whole context window, manipulate it visually, compress a block, run a live test, and save a snapshot, all without switching tools.
ContextCraft is that place. It is a canvas-based interactive workbench for assembling, compressing, testing, and versioning LLM context windows. It runs locally, connects to Ollama for local compression and testing, supports OpenRouter for cloud LLM testing, and exports directly to OpenAI, Anthropic, LangChain, and JSON formats.
Features
Visual Canvas: Drag and drop interface for organizing prompt blocks with real-time token counting and visual progress bars.
Smart Compression: AI-powered compression using Ollama with semantic preservation. Set a target compression ratio, choose whether to preserve structure, and review a before/after comparison before applying.
Coverage Analysis: Semantic similarity scoring between original and compressed content. Key concept preservation is surfaced as a score so you know exactly what you are trading for token savings.
LLM Testing: Test prompts with streaming responses from Ollama or OpenRouter directly from the canvas. Select provider, model, and temperature and view responses in real time.
Version Control: Save and restore canvas versions with a SQLite backend. Name versions for easy reference and compare two versions to see what changed.
Multi-Format Export: Export to OpenAI, Anthropic, LangChain, and JSON formats. Copy the generated code and paste directly into your application.
Block Library: Pre-built starter blocks for common use cases, available from the sidebar. Add your own blocks to the library for reuse across canvases.
Architecture
ContextCraft is split into a FastAPI backend and a React + Vite frontend.
The backend handles token counting via tiktoken, semantic similarity analysis via sentence-transformers, compression via Ollama, streaming LLM test responses via Ollama or OpenRouter, SQLite-backed version management, and export format generation. The frontend renders the visual canvas with drag-and-drop via @hello-pangea/dnd and code editing via CodeMirror.
contextcraft/
├── server/ # FastAPI backend
│ ├── main.py # FastAPI app entry point
│ ├── models.py # Pydantic data models
│ ├── tokenizer.py # Token counting (tiktoken)
│ ├── coverage.py # Semantic similarity analysis
│ ├── compress.py # Ollama compression service
│ ├── tester.py # LLM streaming test service
│ ├── export.py # Export format generators
│ ├── versions.py # SQLite version management
│ └── pricing.py # OpenRouter pricing API
├── frontend/ # React + Vite frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ └── App.jsx # Main application
│ └── package.json
├── cli/ # CLI entry point
│ └── main.py
└── pyproject.toml # Python package config
Getting Started
Prerequisites
- Python 3.9+
- Node.js 18+
- Ollama (optional, for local compression and testing)
- OpenRouter API key (optional, for cloud LLM testing)
- Installation
Installation
# Clone the repository
git clone https://github.com/contextcraft/contextcraft.git
cd contextcraft
# Install Python dependencies
pip install -e ".[dev]"
# Install frontend dependencies
cd frontend
npm install
cd ..
# Initialize the database
contextcraft init-db
Running the Application
# Start the server and frontend
contextcraft serve
# Or start with custom options
contextcraft serve --host 0.0.0.0 --port 8000 --frontend-port 3000
Once running, the frontend is available at localhost:5173 and the API docs at localhost:8000/docs.
Configuration
Create a .env file in the project root:
# OpenRouter API key (for cloud LLM testing)
OPENROUTER_API_KEY=your_api_key_here
# Ollama URL (default: http://localhost:11434)
OLLAMA_URL=http://localhost:11434
# Default compression model
DEFAULT_COMPRESSION_MODEL=gemma2:2b
Supported Models
Token Counting: GPT-4, GPT-4o, GPT-4o Mini, GPT-3.5 Turbo, Claude 3 Opus, Sonnet, Haiku, Claude 3.5 Sonnet
Compression (via Ollama): gemma2:2b (default), any Ollama-compatible model
Testing: Ollama local models, OpenRouter cloud models (requires API key)
Usage Guide
Creating a Canvas: Start with an empty canvas or load from the library. Add blocks using the sidebar buttons or drag from the library. Arrange blocks by dragging to reorder. Edit block content inline or in the full editor.
Compressing Content: Click the compress icon on any block. Set the target compression ratio (0.1 to 0.9). Choose whether to preserve structure. Review the before/after comparison. Apply compression when satisfied.
Testing Prompts: Add your prompt blocks to the canvas. Click the Test button. Select provider (Ollama or OpenRouter). Choose model and set temperature. View streaming responses in real time.
Analyzing Coverage: Compress one or more blocks. Click the Coverage button. View semantic similarity scores. Check key concept preservation.
Managing Versions: Click Versions to save the current state. Name your version for easy reference. Restore previous versions at any time. Compare versions to see changes.
Exporting: Click Export when ready. Choose format, OpenAI, Anthropic, LangChain, or JSON. Copy the generated code. Paste into your application.
API Endpoints
Token Management
-
POST /api/tokenize- count tokens for text or blocks -
GET /api/pricing- get model pricing information
Compression
-
POST /api/compress- compress text using Ollama -
POST /api/coverage- analyze semantic coverage between original and compressed
Testing
POST /api/test - stream LLM responses from Ollama or OpenRouter
Versioning
-
GET /api/versions- list all versions -
POST /api/versions- save a new version -
GET /api/versions/{id}- get a specific version -
POST /api/versions/{id}/restore- restore a version -
POST /api/versions/compare- compare two versions
Export
POST /api/export - export canvas to OpenAI, Anthropic, LangChain, or JSON format
Library
GET /api/library - get starter block library
POST /api/library - add a block to the library
CLI Commands
# Start the application
contextcraft serve
# Initialize database
contextcraft init-db
# Add a block to library
contextcraft add-block --type system --label "My Template" --content "..."
# Get help
contextcraft --help
Docker
# Build image
docker build -t contextcraft .
# Run container
docker run -p 8000:8000 -p 5173:5173 contextcraft
Development
Backend
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black server/ cli/
isort server/ cli/
# Type checking
mypy server/ cli/
Frontend
cd frontend
# Start dev server
npm run dev
# Build for production
npm run build
# Run linter
npm run lint
Contributing
Fork the repository. Create a feature branch:
git checkout -b feature/amazing-feature
Commit your changes:
git commit -m 'Add amazing feature'
Push to the branch:
git push origin feature/amazing-feature
Then open a Pull Request.
How I Built This Using NEO
This tool was designed, built, debugged, and iterated entirely using NEO -An autonomous AI engineering agent that writes, runs, and refines real code end-to-end.
ContextCraft is a full-stack application, a FastAPI backend, a React + Vite frontend, a CLI, and a SQLite-backed versioning layer. Every part of the system was generated and connected through NEO: the backend services for token counting, semantic coverage analysis, compression via Ollama, streaming LLM testing, export pipelines, version management, and pricing integration, along with the interactive frontend canvas for assembling prompt blocks with drag-and-drop, inline editing, and real-time token tracking.
The compression and coverage pipeline, the live testing flow across Ollama and OpenRouter, the version save/restore and comparison system, and the multi-format export layer were all built end-to-end from a high-level problem description. NEO handled the full cycle - generating code, wiring components, resolving issues, and refining the system into a working product.
How You Can Use and Extend This With NEO
Prompt engineering workbench: Instead of iterating on prompts in a text editor and manually counting tokens, assemble your context window visually, compress blocks that are too long, and test the result, all in one place. The version control means you never lose a working configuration while experimenting.
Evaluate compression quality before shipping: Before deploying a compressed prompt to production, run coverage analysis to get a semantic similarity score between the original and compressed version. You know exactly how much meaning you are trading for token savings, not just a token count but an actual semantic measurement.
Manage prompt libraries across projects: The block library lets you save reusable prompt blocks and load them into any canvas. Teams building multiple LLM products can maintain a shared library of tested, versioned prompt components.
Extend it with additional export formats: The export module currently supports OpenAI, Anthropic, LangChain, and JSON. Adding a new format follows the same pattern in export.py and surfaces automatically in the Export UI without touching any other part of the stack.
Final Notes
Context window management is one of those problems that looks simple until you are doing it seriously. ContextCraft brings together the pieces that are usually scattered across different tools visual assembly, token counting, AI compression, semantic coverage analysis, live testing, version control, and export into a single local workbench.
The code is at https://github.com/dakshjain-1616/ContextCraft
You can also build with NEO in your IDE using the VS Code extension or Cursor.
You can also use NEO MCP with Claude Code: https://heyneo.com/claude-code

Top comments (0)